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Abstract 



Many emerging applications involve sparse signals, and their processing is a subject of active research. 
We desire a large class of sensing matrices which allow the user to discern important properties of 
the measured sparse signal. Of particular interest are matrices with the restricted isometry property 
(RIP). RIP matrices are known to enable efficient and stable reconstruction of sufficiently sparse 
signals, but the deterministic construction of such matrices has proven very difficult. In this thesis, 
we discuss this matrix design problem in the context of a growing field of study known as frame 
theory. In the first two chapters, we build large families of equiangular tight frames and full spark 
frames, and we discuss their relationship to RIP matrices as well as their utility in other aspects of 
sparse signal processing. In Chapter 3, we pave the road to deterministic RIP matrices, evaluating 
various techniques to demonstrate RIP, and making interesting connections with graph theory and 
number theory. We conclude in Chapter 4 with a coherence-based alternative to RIP, which provides 
near-optimal probabilistic guarantees for various aspects of sparse signal processing while at the same 
time admitting a whole host of deterministic constructions. 
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0.1 Overview 



In several applications, data is traditionally collected in massive quantities before employing a rea- 
sonable compression strategy. The result is a storage bottleneck that can be prevented with a data 
collection alternative known as compressed sensing. The philosophy behind compressed sensing is 
that we might as well target the meaningful data features up front instead of spending our storage 
budget on less-telling measurements. As an example, natural images tend to have a highly com- 
pressible wavelet decomposition because many of the wavelet cofRcients are typically quite small. In 
this case, one might consider targeting large wavelet coefficients as desired image features; in fact, 
removing the contribution of the smallest wavelet coefficients will have little qualitative effect on the 
image [57], and so using sparsity in this way is intuitively reasonable. 

Let X be an unknown A'^-dimensional vector with the property that at most K of its entries 
are nonzero, that is, x is K-sparse. The goal of compressed sensing is to construct relatively few 
non-adaptive linear measurements along with a stable and efficient reconstruction algorithm that 
exploits this sparsity structure. Expressing each measurement as a row of an M x A'' matrix we 
have the following noisy system: 

y = <^x + z. (1) 

In the spirit of compressed sensing, we only want a few measurements: M <^ N. Also, in order for 
there to exist an inversion process for (1), $ must map ii'-sparse vectors injectively, or equivalently, 
every subcollection of 2K columns of $ must be linearly independent. Unfortunately, the natural 
reconstruction method in this general case, i.e., finding the sparsest approximation of y from the 
dictionary of columns of is known to be NP-hard [108]. Moreover, the independence requirement 
does not impose any sort of dissimilarity between the columns of meaning distinct identity basis 
elements could lead to similar measurements, thereby bringing instability in reconstruction. 

To get around the NP-hardness of sparse approximation, we need more structure in the matrix 
Indeed, several efficient reconstruction algorithms have been considered (e.g.. Basis Pursuit [61, 
62, 77], Orthogonal Matching Pursuit [62, 134], and the Least Absolute Shrinkage and Selection 
Operator [20]), and their original performance guarantees depend on the additional structure that 
the columns of $ are nearly orthogonal to each other. Depending on the algorithm, this structure in 
the sensing matrix enables successful reconstruction when noise term z in (1) is zero, adversarial, or 
stochastic, but for any of the original guarantees to apply, the sparsity level must he K = 0(\/M). 
To reconstruct signals with larger sparsity levels, Candes and Tao [39] impose a much stronger 
requirement on the sensing matrix: that every submatrix of 2K columns of $ be well-conditioned. 
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To be explicit, we have the following definition: 

Definition 1. The matrix $ has the {K, 5) -restricted isometry property (RIP) if 

{i~s)\\xr<\\^xr<{i+5)\\xr 

for every X-sparse vector x. The smallest 6 for which $ is {K,S)-R1P is the restricted isometry 
constant (RIG) 5k- 

In words, matrices which satisfy RIP act as a near-isometry on sufficiently sparse vectors. Among 
other things, this structure imposes near-orthogonality between the columns of and so in light of 
the previous results, it is not surprising that RIP sensing matrices enable efficient reconstruction: 

Theorem 2 (Theorem 1.3 in [34]). Suppose an M x N matrix $ has the {2K, d)-restricted isometry 
property for some 6 < \/2 — 1. Assuming \\z\\ < e, then for every K-sparse vector x G R-'^, the 
following reconstruction from (1).- 

i = argmin ||:e||i s.t. \\y — ^x\\ < e 

satisfies \\x — x\\ < Ce, where C only depends on 6. 

The exciting part about this guarantee is how the sparsity level K of recoverable signals scales 
with the number of measurements M. Certainly, we expect at least K ~ \/M since RIP is a 
stronger matrix requirement than near-orthogonality between columns. In analyzing the sparsity 
level, random matrices have found the most success, specifically matrices with independent Gaussian 
or Bernoulli entries [17], or matrices whose rows were randomly selected from the discrete Fourier 
transform matrix [118]. With high probability, these random constructions support sparsity levels 
K on the order of -^^^ for some a> 1. Intuitively, this level of sparsity is near-optimal because K 
cannot exceed ^■ by the linear independence condition. Thus, Theorem 2 is a substantial improve- 
ment over the previous guarantees, and this has prompted further investigation of RIP matrices. 
Unfortunately, it is difficult to check whether a particular instance of a random matrix is [K, (5)-RIP, 
as this involves the calculation of singular values for all (^) submatrices of K columns of the matrix. 
For this reason, and for the sake of reliable sensing standards, many have pursued deterministic RIP 
matrix constructions; Tao discusses the significance of this open problem in [132]. 

Throughout this thesis, we consider the problem from a variety of directions. In Chapter 1, 
we observe a technique which is commonly used to analyze the restricted isometry of deterministic 
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constructions: the Gershgorin circle theorem. This technique fails to demonstrate RIP for large 
sparsity levels; it is only capable of showing RIP for sparity levels on the order of \fM, as opposed 
to M. This limitation has become known as the "square-root bottleneck." To illustrate that this 
bottleneck is not merely an artifact of the Gershgorin analysis, we consider a construction which 
is optimal in the Gershgorin sense, and we establish that this construction is (K^ (5)-RIP for every 
K < 5^fM but is not {K, 1 — e)-RIP for any K > \J2M. The first inequality is proved by the 
Gershgorin circle theorem, while the second uses the spark of the matrix, that is, the number of 
nonzero entries in the sparsest vector in its nuUspace. While this disparity between \fM and M is 
significant in many applications, such constructions are particularly well-suited for the sparse signal 
processing application of digital fingerprinting, and so we briefly investigate this application. 

For the applications with larger sparsity levels, we note that spark deficiency is incompatible 
with restricted isometry; indeed, any matrix which is (i^, 1 — £)-RIP necessarily has spark strictly 
greater than K. As such, in Chapter 2, we consider M x N full spark matrices, that is, matrices 
whose spark is as large as possible: M + 1. We start by finding various full spark constructions 
using Vandermonde matrices and discrete Fourier transforms. These deterministic constructions are 
particularly attractive as RIP candidates because they satisfy the necessary condition of large spark, 
a property which is difficult to verify in general. To solidify this notion of difficulty, we also show that 
the problem of testing whether a matrix is full spark is hard for NP under randomized polynomial- 
time reductions; this contrasts with the similar problem of testing for RIP, which currently has 
unknown computational complexity [93]. To demonstrate that full spark matrices are useful in 
their own right, we use them to solve another important problem in sparse signal processing: signal 
recovery without phase. 

To date, the only deterministic RIP construction that manages to go beyond the square-root 
bottleneck is given by Bourgain et al. [29]. In Chapter 3, we discuss the technique they use to 
demonstrate RIP. It is important to stress the significance of their contribution: Before [29], it was 
unclear how deterministic analysis might break the bottleneck, and as such, their result is a major 
theoretical achievement. On the other hand, their improvement over the square-root bottleneck is 
notably slight compared to what random matrices provide. However, we show that their technique 
can actually be used to demonstrate RIP for sparsity levels much larger than ^/M, meaning one 
could very well demonstrate random-like performance given the proper construction. Our result 
applies their technique to random matrices, and it inadvertently serves as a simple alternative proof 
that certain random matrices are RIP. We also introduce another technique, and we show that it 
can demonstrate RIP for similarly large sparsity levels. Later, we propose a specific class of full 
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spark matrices as candidates for being RIP. Using a correspondence between these matrices and the 
Paley graphs, we observe certain combinatorial and number-theoretic impUcations; this lends some 
probabilistic intuition for a new bound on the clique number of Paley graphs of prime order. 

After investigating deterministic RIP matrices in Chapters 1-3, we have yet to find deterministic 
M X N sensing matrices which provably allow for the efficient reconstruction of signals with sparsity 
level K ~ N ^'•^^ some a > 1. To fill this gap, in Chapter 4, we consider an alternative 
model for the sparsity in our signal, namely, that the locations of the nonzero entries are drawn 
uniformly at random. With this model, we show that a particularly simple algorithm called one- 
step thresholding can reconstruct the signal with high probability provided K = 0(j^^^). In fact, 
this performance guarantee requires relatively modest structure in the sensing matrix: that the 
columns are nearly orthogonal to each other and well-distributed over the unit sphere. Indeed, this 
structural requirement is much less stringent than RIP, and we provide a catalog of random and 
deterministic sensing matrices which satisfy these conditions. Later, we further analyze the two 
conditions separately, finding new fundamental limits on near-orthogonality and illustrating how to 
manipulate a given sensing matrix to achieve good distribution over the sphere. 

Throughout this thesis, we use ideas from frame theory, and so it is fitting to take some time to 
review the basics: 

0.2 A brief introduction to frame theory 

A frame is a sequence {<Pi}iei in a Hilbert space H with frame bounds < A < B < oo that satisfy 



Frames were introduced by DuHin and Schaeffer [64] in the context of nonharmonic Fourier analysis, 
where H = tt, tt) and the frame elements ipi are sinusoids of irregularly spaced frequencies. 
However, the modern application of frame theory to signal processing came decades later after the 
landmark paper of Daubechies et al. [55]. This paper gave the first nontrivial examples of tight 
frames, that is, frames with equal frame bounds A = B. The utility of tight frames lies partially in 
their painless reconstruction formula: 



A\\xf <"^\{x,<fii)\'^ < B\\xf yx€H. 
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Note that orthonormal bases are tight frames with A = B = 1; in this way, frames form a nat- 
ural and useful generalization. While this founding research in frame theory concerned frames 
over infinite-dimensional Hilbert spaces, many of today's applications of frames require a finite- 
dimensional treatment. In fact, finite frame theory has found some important progress in the past 
decade [18, 33, 42, 43, 47, 129], and the remainder of this section will discuss the basics of this field. 

In finite dimensions, say, H = C*^, a frame is given by the columns of a full-rank M x N matrix 
$ = [ipi ■ ■ ■ ipisf] with N > M. Here, the extreme eigenvalues of $$* are the frame bounds, and a 
tight frame has equal frame bounds; equivalently, a frame $ is tight if 

(i) the rows are equal-norm and orthogonal. 

As established above, tight frames $ are useful because they give a redundant linear encoding 
y = of a signal x that permits painless recovery: x = \^y, where A is the common squared- 
norm of the rows. Constructing tight frames is rather simple: perform Gram-Schmidt on the rows 
of any frame to orthogonalize with equal norms. For the sake of democracy in the entries of the 
encoding y, some applications opt for a unit norm tight frame (UNTF) [45], which has the additional 
property that 

(ii) the columns are unit-norm. 

Constructing UNTFs has proven a bit more difficult, and there has been a lot of research to char- 
acterize these [18, 33, 127]. As a special example of a UNTF, take any rows from a discrete Fourier 
transform matrix and normalize the resulting columns. In addition to unit-norm tightness, it is 
often beneficial to have the columns of $ be incoherent, and this occurs when $ is an equiangular 
tight frame (ETF), that is, a UNTF with the final property that 

(iii) the sizes of the inner products between distinct columns are equal. 

ETFs do not exist for all matrix dimensions [19], and there are only three general constructions 
to date [70, 141, 146]; these invoke block designs, strongly regular graphs, and difference sets, 
respectively. 

To mitigate any confusion, the reader should be aware that throughout the literature, both 
UNTFs and ETFs are referred to as Welch-bound equality sequences [120]. As one might expect, 
each achieves equality in one of two important inequalities, and it is important to review them. 
Consider M x N matrices <I> = [i^i ■ ■ ■ (Pn] which have (ii), but not necessarily (i) or (iii). As such, 
$ might not be a frame, but we can still take the Hilbert-Schmidt norm of the Gram matrix of its 
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columns: 

N N 



II***IIhs = EEI<^"''^"')I'- 



n=l n'=l 

This is oftentimes called the frame potential of $ [18], and its significance will become apparent 
shortly. Since the columns of $ have unit norm, and since has at most M nonzero eigenvalues, 
we have 

(M X 2 M 

m=l ^ m=l 

where the inequality follows from the Cauchy-Schwarz inequality with the all-ones vector. As such, 
equality is achieved if and only if the M largest eigenvalues of are equal; since these are also 
the eigenvalues of this implies that is a multiple identity, and so $ satisfies (ii). Thus, 

the frame potential of (f) satisfies ||$*$j|yg > with equality if and only if $ is a UNTF. Some 
call this the Welch bound, and therefore say that UNTFs have Welch-bound equality. 

Another bound is also (more correctly) referred to as the Welch bound, and its derivation uses 
the previous one. It concerns the worst-case coherence oi an M x N matrix $ = [<^i • • • (Pn] that 
satisfies (ii): 

IJ,:= max \{ifi„,ipn')\. 

n,n e{l,...,N} 

Since the columns of $ have unit norm, we have 

n=l n'=l 

Again, equality is achieved in the first inequality if and only if $ satisfies (i). Also, equality is 
achieved in the second inequality if and only if $ satisfies (iii). Rearranging gives the following: 

Theorem 3 (Welch bound [129, 143]). Every M x N matrix $ with unit-norm columns has worst- 
case coherence 



N-M 



M{N - 1)' 

with equality if and only if $ is an equiangular tight frame. 

Equiangular lines have long been a subject of interest [97], and since equiangular tight frames 
have minimal coherence, they are particularly useful in a number of applications. Recent work 
on ETFs was spurred by results inspired by communication theory [26, 84, 129] that show that 
the linear encoders provided by ETFs are optimally robust against channel erasures. In the real 
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setting, the existence of an ETF of a given size is equivalent to the existence of a strongly regular 
graph with certain corresponding parameters [84, 122] . Such graphs have a rich history and remain 
an active topic of research [31]; the specific ETFs which arise from particular graphs are detailed 
in [141]. Some of this theory generalizes to the complex- variable setting in the guise of complex 
Seidel matrices [25, 27, 65]. Many approaches to constructing ETFs have focused on the special case 
in which every entry of $ is a root of unity [88, 115, 128, 130, 146]. Other approaches are given 
in [46, 125, 137]. In the complex setting, much attention has focused on the maximal case of 
vectors in [9, 68, 91, 116, 121]. 

In the next chapter, we construct one of three known general families of ETFs, and we evaluate 
their performance as RIP matrices. Having reviewed the frame-theoretic background for this thesis, 
the interested reader is encouraged to discover more about frame theory in [49]. 
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Chapter 1 



Steiner equiangular tight frames 



In this chapter, we provide a new method for constructing equiangular tight frames (ETFs), that 
is, matrices $ with orthogonal and equal-norm rows, and unit-norm columns whose inner products 
arc equal in modulus. As discussed earlier, such frames have minimal worst-case coherence, and 
are therefore quite useful in applications. However, up to this point, they have proven notoriously 
difficult to construct. By contrast, the construction of Steiner equianglar tight frames is particularly 
simple: a tensor-like combination of a Steiner system and a regular simplex. This simplicity permits 
us to resolve an open question regarding ETFs and the restricted isometry property (RIP): we show 
that the RIP performance of some ETFs is unfortunately no better than the so-called "square-root 
bottleneck." 

In the next section, we provide some simple tests for demonstrating whether a given matrix 
is RIP; not only will this clarify the notion of the square-root bottleneck, it will show how ETFs 
are in some sense optimal as deterministic RIP matrices, thereby motivating the construction of 
ETFs. Later, we provide the main result of this chapter, namely Theorem 7, which shows how 
certain Steiner systems may be combined with regular simplices to produce ETFs [69, 70]. In the 
third section, we discuss each of the known infinite families of such Steiner systems, and compute 
the corresponding infinite families of ETFs they generate. We further provide some necessary and 
asymptotically sufiicient conditions, namely Theorem 8, to aid in the quest for discovering other 
examples of such frames that lie outside of the known infinite families. Finally, after demonstrating 
that Steiner ETFs fail to break the square-root bottleneck, we consider their application to the 
design of digital fingerprints to combat data piracy [103, 104]. 
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1.1 Simple tests for restricted isometry 

Before formally defining Steiner equiangular tight frames, we motivate their construction by review- 
ing a couple common methods for determining whether a matrix is RIP: 

Positive test for RIP: Apply the Gcrshgorin circle theorem to the submatrices 
Negative test for RIP: Find a sparse vector in the nuUspace of 

In what follows, we discuss each of these tests in more detail, and later, we will use these tests to 
analyze Steiner ETFs as RIP matrices. 

1.1.1 Applying Gershgorin's circle thoerem 

Take an M x N matrix i>, and recall Definition 1. For a given K, we wish to find some 6 for which $ 
is {K, 6)-RlP. To this end, it is useful to consider the following expression for the restricted isometry 
constant: 

Lemma 4. The smallest S for which $ is {K,6)-RIP is given by 

6k= maj, m*^<^^-lKh, (1-1) 

\K\=K 

where ^/c denotes the submatrix consisting of columns of <^ indexed by K. 

Proof. Wc first note that $ being (if, (5)-RIP trivially implies that $ is [K, 6 + e)-RIP for every 
e > 0. It therefore suffices to show that the expression for 5k in (1.1) satisfies two criteria: (i) $ is 
{K, )-RIP, and (ii) $ is not {K, (5)-RIP for any 5 < 5k- To this end, pick some if-sparse vector x. 
To prove (i), we need to show that 

{i-5K)\\xr<\\M\'<a+SK)\\xr. (1.2) 

Let /C C {1, . . . , N} be the size-ii' support of x, and let xjc be the corresponding subvector. Then 
rearranging (1.2) gives 




Since the expression for 5k in (1-1) maximizes (1.3) over all supports /C and entry values xjc, 
the inequality necessarily holds; that is, $ is necessarily {K,5k)-^P- Furthermore, equality is 
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achieved by the support K which maximizes (1.1) and the eigenvector xk. corresponding to the 
largest eigenvalue of ^j^^x: ~ ^k] this proves (ii). □ 

Note that we are not tasked with actually computing 5k rather, we recognize that <I> is {K, S)- 
RIP for every 6 > Sk, and so we seek an upper bound on 6k- The following classical result offers a 
particularly easy-to-calculate bound on eigenvalues: 

Theorem 5 (Gershgorin circle theorem [73]). For each eigenvalue A of a K x K matrix A, there is 
an index i € {I, . . . , K} such that 

K 



\-A[i,i\\<Y,\A[i,j\ 



To use this theorem, take some $ with unit-norm columns. Note that ^j^^x; Gram 
matrix of the columns indexed by /C, and as such, the diagonal entries are 1, and the off-diagonal 
entries arc inner products between distinct columns of $. Let denote the worst-case coherence of 

3" = [(^1 • • • ifN\- 

M := . . max 
i,je{i,...,jv} 

Then the size of each off-diagonal entry of $^<i>^ is < /i, regardless of our choice for /C. Therefore, 
for every eigenvalue A of "~ ^k, the Gershgorin circle theorem gives 

K 

|A| = |A - 0| < ^ <{K- 1)m. (1.4) 

Since (1.4) holds for every eigenvalue A of ^x:^k ~ and every choice of /C C {1, . . . , A^}, we 
conclude from (1.1) that Sk < {K — l)/i, i.e., <I> is {K, (K — l)/i)-RIP. This process of using the 
Gershgorin circle theorem to demonstrate RIP for deterministic constructions has become standard 
in the community [8, 60, 70]. 

Recall that random RIP constructions support sparsity levels K on the order of ^^^^y for some 
a > 1. To see how well the Gershgorin circle theorem demonstrates RIP, we need to express fi in 
terms of M and A''. To this end, we consider the Welch bound (Theorem 3): 



N-M 



M(iV- 1) 

Since equiangular tight frames (ETFs) achieve equality in the Welch bound (as demonstrated in 
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Section 0.2), we can further analyze what it means for an M x N ETF $ to be {K, {K - l)/u)-RIP. 
In particular, since Theorem 2 requires that $ be {2K,5)-BIP for S < \/2 — 1, it suffices to have 
^= < V2 — 1, since this implies 

S = i2K- 1), = i2K - ^ ^ < ^ - 1- (1-^) 

That is, ETFs form sensing matrices that support sparsity levels K on the order of \/M. Most 
other deterministic constructions have identical bounds on sparsity levels [8, 60, 70]. In fact, since 
ETFs minimize coherence, they are necessarily optimal constructions in terms of the Gershgorin 
demonstration of RIP, but the question remains whether they are actually RIP for larger sparsity 
levels; the Gershgorin demonstration fails to account for cancellations in the sub-Gram matrices 
$]^$^, and so this technique is too weak to indicate either possibility. 

1.1.2 Speirk considerations 

Recall that, in order for an inversion process for (1) to exist, $ must map iC-sparse vectors injectively, 
or equivalently, every subcoUection of 2K columns of i> must be linearly independent. This linear 
independence condition can be nicely expressed in more general terms, as the following definition 
provides: 

Definition 6. The spark of a matrix $ is the size of the smallest linearly dependent subset of 
columns, i.e., 

Spark($) = min |||a;||o : = 0, x^^oj. 

This definition was introduced by Dohono and Elad [61] to help build a theory of sparse repre- 
sentation that later gave birth to modern compressed sensing. The concept of spark is also found 
in matroid theory, where it goes by the name girth. The condition that every subcoUection of 2K 
columns of $ is linearly independent is equivalent to Spark(<i>) > 2K. Relating spark to RIP, sup- 
pose $ is {K,S)-RIP with Spark($) < K. Then there exists a nonzero iC-sparse vector x such 
that (1 — (5)||a;||^ < ||$a;||^ = 0, and so 6 > 1. The reason behind this stems from our necessary 
linear independence condition: RIP implies linear independence, and so small spark implies linear 
dependence, which in turn implies not RIP. 

As an example of using spark to test RIP, consider the M x 2M matrix $ = [/ F] that comes 
from concatenating the identity matrix I with the unitary discrete Fourier transform matrix F. 
In this example, columns from a common orthonormal basis are orthogonal, while columns from 
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different bases have an inner product of size -p^. As such, the Gershgorin analysis gives that $ 
is {K, 5)-RIP for all 5 > . However, when M is a perfect square, the Dirac comb x of \fM 
Kronecker deltas is an eigenvector of F, and so concatenating Fa; with —x produces a 2\/M-sparse 
vector in the nullspace of In other words, Spark($) < 2\/M, and so $ is not {K, 1 — £)-RIP 
for any K > 2\/M. After building Steiner equiangular tight frames, we will see that they perform 
similarly as RIP matrices. 

1.2 Constructing Steiner equiangular tight frames 

Steiner systems and block designs have been studied for over a century; the background facts pre- 
sented here on these topics are taken from [1, 52]. In short, a {v, b, r, k, X)-block design is a u-element 
set V along with a collection B of 6 size-A; subsets of V, dubbed blocks, that have the property that 
any element of V lies in exactly r blocks and that any 2-element subset of V is contained in exactly 
A blocks. The corresponding incidence matrix is a u x 6 matrix A that is one in a given entry if that 
block contains the corresponding point, and is otherwise zero; in this chapter, it is more convenient 
for us to work with the b x v transpose of this incidence matrix. Our particular construction 
of ETFs involves a special class of block designs known as (2, k, v)-Steiner systems. These have the 
property that any 2-element subset of V is contained in exactly one block, that is, A = 1. With 
respect to our purposes, the crucial facts are the following: 

The transpose of the {0, l}-incidence matrix A of a (2, k, ?;)-Steiner system: 

(i) is of size x v, 

(ii) has k ones in each row, 

(iii) has ones in each column, and 

(iv) has the property that any two of its columns have a inner product of one. 

The first three facts follow immediately from solving for b = ^jTr^ and r = , using the well- 
known relations vr = bk and r{k — 1) = X{v — 1). Meanwhile, (iv) comes from the fact that A = 1: 
each column of A'^ corresponds to an element of the set, and the inner product of any two columns 
computes the number of blocks that contains the corresponding pair of points. This in hand, we 
present the main result of this chapter; here, the density of a matrix is the ratio of the number of 
nonzero entries of that matrix to the total number of its entries: 



12 



Theorem 7. Every {2,k,v)-Steiner system generates an equiangular tight frame consisting of N = 
^(1 + vectors in M = -dimensional space with redundancy ^ = k{l + o-f^d density 

k _ /^v^j_a 

Moreover, if there exists a real Hadamard matrix of size 1 + then such frames are real. 
Specifically, a ^1^"^^ xv{l + ETF matrix $ may he constructed as follows: 

1. Let he the x v transpose of the adjacency matrix of a (2, k, v)-Steiner system. 

2. For each j = 1,. . . ,v, let Hj he any (1 + x (1 + jEj) matrix that has orthogonal rows 
and unimodular entries, such as a possihly complex Hadamard matrix. 

3. For each j = 1, . . . ,v, let $j he the x (1 + fEj) matrix ohtained from the jth column of 

hy replacing each of the one-valued entries with a distinct row of Hj, and every zero-valued 
entry with a row of zeros. 

4- Concatenate and rescale the $j 's to form $ = (^5y)' [^i • • • ^v]- 

It is important to note that a version of this ETF construction was previously employed by Seidel 
in Theorem 12.1 of [122] to prove the existence of certain strongly regular graphs. In the context of 
that result, our contributions are as follows: (i) the realization that when Seidel's block design arises 
from a particular type of Steiner system, the resulting strongly regular graph indeed corresponds to 
a real ETF; (ii) noting that in this case, the graph theory may be completely bypassed, as the idea 
itself directly produces the requisite frame and (iii) having bypassed the graph theory, realizing 
that this construction immediately generalizes to the complex-variable setting if Seidel's requisite 
Hadamard matrix is permitted to become complex. These realizations permit us to exploit the vast 
literature on Steiner systems [52] to construct several new infinite families of ETFs, in both the real 
and complex settings. Moreover, these ETFs are extremely sparse in their native space; sparse tight 
frames have recently become a subject of interest in their own right [44] . 

We refer to the ETFs produced by Theorem 7 as {2, k,v) -Steiner ETFs. In essence, the idea 
of the construction is that the nonzero rows of any particular $j form a regular simplex in ^Ej- 
dimensional space; these vectors are automatically equiangular amongst themselves; by requiring 
the entries of these simplices to be unimodular, and requiring that distinct blocks have only one 
entry of mutual support, one can further control the inner products of vectors arising from distinct 
blocks. This idea is best understood by considering a simple example, such as the ETF that arises 
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from a (2, 2, 4)-Steiner system whose transposed incidence matrix is 



+ + 
+ + 
+ + 

+ + 

+ + 



One can immediately verify that corresponds to a block design: there is a set V of w = 4 
elements, each corresponding to a column of there is also a collection B of 6 = 6 subsets of V, 
each corresponding to a row of A"^ ; every row contains k = 2 elements; every column contains r = 3 
elements; any given pair of elements is contained in exactly one row, that is, A = 1, a fact which is 
equivalent to having the inner product of any two distinct columns of A^ being 1. To form an ETF, 
for each of the four columns of A'^ we must choose a 4 x 4 matrix H with unimodular entries and 
orthogonal rows; the size of H is always one more than the number r of ones in a given column of 
A'^. Though in principle one may choose a different H for each column, we choose them all to be 
the same, namely the Hadamard matrix: 



+ + + + 

+ - + - 

+ + - - 

+ - - + 



H 



To form the ETF, for each column of A'^ we replace each of its 1-valued entries with a distinct row 
of H. Again, though in principle one may choose a different sequence of rows of H for each column, 
we simply decide to use the second, third and fourth rows, in that order. The result is a real ETF 
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of N = 16 elements of dimension M = 6: 



(1.6) 



+ - + - + - + - 
+ + -- +- + - 

+ -- + +- + - 

+ + -- + + -- 
+ -- + + + -- 

+ -- + + -- + 

One can immediately verify that the rows of $ are orthogonal and have constant norm, implying $ 
is indeed a tight frame. One can also easily see that the inner products of two columns from the 
same block are — | , while the inner products of columns from distinct blocks are ± | . Theorem 7 
states that this behavior holds in general for any appropriate choice of and H. 

Proof of Theorem 7. To verify $ is a tight frame, note that the inner product of any two distinct 
rows of $ is zero, as they are the sum of the inner products of the corresponding rows of the $j's over 
all J = 1, ... , v; for any j, these shorter inner products are necessarily zero, as they either correspond 
to inner products of distinct rows of Hj or to inner products with zero vectors. Moreover, the rows 
of $ have constant norm: as noted in (ii) above, each row of contains k ones; since each Hj has 
unimodular entries, the squared-norm of any row of $ is the squared-scaling factor times a sum 
of fc(l + f5i) ones, which, as is necessary for any unit norm tight frame, equals the redundancy 

Having that $ is tight, we show $ is also equiangular. We first note that the columns of $ have 
unit norm: the squared-norm of any column of $ is times the squared-norm of a column of one 
of the $j's; since the entries of Hj are unimodular and (iii) above gives that each column of 
contains ones, the squared-norm of any column of $ is (f5x)(|5x)l = 1, as claimed. Moreover, 
the inner products of any two distinct columns of $ has constant modulus. Indeed, the fact (iv) 
that any two distinct columns of have but a single entry of mutual support implies the same is 
true for columns of $ that arise from distinct $j blocks, implying the inner product of such columns 
is times the product of two unimodular numbers. That is, the squarcd-magnitudc of the inner 



products of two columns that arise from distinct blocks is 



N-M 

M(JV-l) 



(r^) ' ^® needed. Meanwhile, 



the same holds true for columns that arise from the same block $j . To see this, note that since 
Hj is a scalar multiple of a unitary matrix, its columns are orthogonal. Moreover, $j contains all 
but one of the Hj^s rows, namely one for each of the 1-valued entries of A^, a la (iii). Thus, the 
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inner products of the portions of Hj that he in $j are their entire inner product of zero, less the 
contribution from the left-over entries. Overall, the inner product of two columns of $ that arise 
from the same $j block is times the negated product of one entry of Hj and the conjugate of 
another; since the entries of Hj are unimodular, we have that the squared-magnitude of such inner 

products is j^j^jy-^i) ~ (li^)^' ^'^ needed. 

Thus $ is an ETF. Moreover, as noted above, its redundancy \s = fc(l + fEi)- AH that 
remains to verify is its density: as the entries of each Hj are all nonzero, the proportion of <I>'s 
nonzero entries is the same as that of the incidence matrix A, which is clearly |, having k ones in 
each w-dimensional row. Moreover, substituting N = v{l + fEj) and M = ^^.Zt) ™to the quantity 
M(N~-M) reveals it to be ^, and so the density can be alternatively expressed as { m(n~-m) )^ ■ ^ 

In the next section, we apply Theorem 7 to produce several infinite families of Steiner ETFs. 
Before doing so, however, we pause to remark on the redundancy and sparsity of such frames. In 
particular, note that since the parameters k and v of the requisite Steiner system always satisfy 
2 <k <v, the redundancy + ^5^) of Steiner ETFs is always between k and 2k; the redundancy 
is therefore on the order of k, and is always strictly greater than 2. If a low-redundancy ETF 
is desired, one can always take the Naimark complement [43] of an ETF of A'' elements in M- 
dimensional space to produce a new ETF of N elements in {N — M)-dimensional space; though the 
complement process does not preserve sparsity, it nevertheless transforms any Steiner ETF into a 
new ETF whose redundancy is strictly less than 2. However, such a loss of sparsity should not be 
taken lightly. Indeed, the low density of Steiner ETFs gives them a large computational advantage 
over their non-sparse brethren. 

To clarify, the most common operation in frame-theoretic applications is the evaluation of the 
analysis operator $* on a given x G C^. For a non-sparse this act of computing requires 
0{MN) operations; for a frame $ of density D, this cost is reduced to 0{DMN). Indeed, using the 
explicit value of D = { m{n~-m) )^ given in Theorem 7 as well as the aforementioned fact that the 
redundancy of such frames necessarily satisfies ^ > 2, we see that the cost of evaluating $*x when 
$ is a Steiner ETF is on the order of ( ^^^"^^ < {2M)^N operations, a dramatic cost savings 
when M is large. Further efficiency is gained when $ is real, as its nonzero elements are but a fixed 
scaling factor times the entries of a real Hadamard matrix, implying can be evaluated using 
only additions and subtractions. The fact that every entry of is either or ±1 further makes real 
Steiner ETFs potentially useful for applications that require binary measurements, such as design 
of experiments. 
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1.3 Examples of Steiner equiangular tight frames 

In this section, we apply Theorem 7 to produce several infinite families of Steiner ETFs. When 
designing frames for real-world applications, three considerations reign supreme: size, redundancy 
and sparsity. As noted above, every Steiner ETF is very sparse, a serious computational advantage in 
high-dimensional signal processing. Moreover, some of these infinite families, such as those arising 
from finite affine and projective geometries, provide great flexibility in choosing the ETF's size 
and redundancy. Indeed, these constructions provide the first known guarantee that for a given 
application, one is always able to find ETFs whose frame elements lie in a space whose dimension 
matches, up to an order of magnitude, that of one's desired class of signals, while simultaneously 
permitting one to have an almost arbitrary fixed level of redundancy, a handy weapon in the fight 
against noise. To be clear, recall that the redundancy of a Steiner ETF is always strictly greater 
than 2. Moreover, general bounds on the maximal number of equiangular lines [97] require that any 
real MxN ETF satisfy N < and any complex ETF satisfy N < M"^; thus, the redundancy 

of an ETF is never truly arbitrary. Nevertheless, if one prescribes a given level of redundancy in 
advance, the Steiner method can produce arbitrarily large ETFs whose redundancy is approximately 
the prime power closest to the desired level. 

1.3.1 Infinite families of Steiner equiangular tight frames 

We now detail eight infinite families of ETFs, each generated by applying Theorem 7 to one of the 
eight completely understood infinite families of (2, fc, u)-Steiner systems. Table 1.1 summarizes the 
most important features of each family, and Table 1.2 gives the first few examples of each type, 
summarizing those that lie in 100 dimensions or less. 

All two-element blocks: (2, 2, u)-Steiner ETFs for any v >2. 

The first infinite family of Steiner systems is so simple that it is usually not discussed in the design- 
theory literature. For any v > 2, let V be a w-elcmcnt set, and let B be the collection of all 2-clement 
subsets of V. Clearly, we have b = blocks, each of which contains k — 2 elements; each point 

is contained in r = v — 1 blocks, and each pair of points is indeed contained in but a single block, 
that is, A = 1. 

By Theorem 7, the ETFs arising from these (2, 2, t;)-Steiner systems consist of N = v{l + ^E{) = 
vectors in M = = -dimensional space. Though these frames can become arbitrarily 

large, they do not provide any freedom with respect to redundancy: ^ = 2^^ is essentially 2. 
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These frames have density ^ = ^- Moreover, these ETFs can be real- valued if there exists a real 
Hadamard matrix of size 1 + |5i = u. In particular, it suffices to have t; to be a power of 2; should 
the Hadamard conjecture prove true, it would suffice to have v divisible by 4. 

One example of such an ETF with v = A was given in the previous section. For a complex 

example, consider v = 3. The b x v transposed incidence matrix is 3 x 3, with each row 
corresponding to a given 2-element subset of {0, 1, 2}: 



+ 



+ + 



To form the corresponding 3x9 ETF i>, we need a 3 x 3 unimodular matrix with orthogonal rows, 
such as a DFT; letting w = e^'^'/^, we can take 



H 



1 1 1 

1 CJ 

1 w 



To form in each column of , we replace each 1-valued entry with a distinct row of H. Always 

choosing the second and third rows yields an ETF of 9 elements in C"^: 



$ = 



V2 



1 w 1 w 

1 w 1 w 



This is the only known instance of when the Steiner-based construction of Theorem 7 produces a 
maximal ETF, that is, one that has A'' = M^. 



Steiner triple systems: (2, 3, u)-Steiner ETFs for any w = 1,3 mod 6. 

Steiner triple systems, namely (2, 3, w)-Steiner systems, have been a subject of interest for over a 
century, and are known to exist precisely when ti = 1,3 mod 6 [52]. Each of the b = ^^"g"^^ blocks 
contains fc = 3 points, while each point is contained in r = blocks. The corresponding ETFs 
produced by Theorem 7 consist of ^^'"^^^ vectors in -dimensional space. The density of such 

frames is ^. As with ETFs stemming from 2-element blocks, Steiner triple systems offer little 
freedom in terms of redundancy: ^ = S^^j is always approximately 3. Such ETFs can be real if 
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there exists a real Hadamard matrix of size . 

Four element blocks: (2, 4, w)-Steiner ETFs for any u = 1,4 mod 12. 

It is known that (2, 4, t;)-Steiner systems exist precisely when u = 1,4 mod 12 [1]. Continuing the 
trend of the previous two famiUcs, these ETFs can vary in size but not in redundancy: they consist 
of ^^'^^^^ vectors in "^^2 -dimensional space, having redundancy 4^^^! and density ^. Interestingly, 
such frames can never be real: with the exception of the trivial 1x1 and 2x2 cases, the dimensions 
of all real Hadamard matrices are divisible by 4; since t; = 1,4 mod 12, the requisite matrices H 
here are of size = 1,2 mod 4. 

Five element blocks: (2, 5,u)-Steiner ETFs for any v = 1,5 mod 20. 

It is also known that (2, 5, u)-Steiner systems exist precisely when v = 1,5 mod 20 [1]. The corre- 
sponding ETFs consist of ^^^^^^ vectors in 20 -dimensional space, having redundancy and 

density |. Such frames can be real whenever there exists a real Hadamard matrix of size In 
particular, letting v = 45, we see that there exists a real Steiner ETF of 540 vectors in 99-dimensional 
space, a fact not obtained from any other known infinite family. 

AfRne geometries: (2, g, (j'")-Steiner ETFs for any prime power q, n>2. 

At this point, the constructions depart from those previously considered, allowing both k and v to 
vary. In particular, using techniques from finite geometry, one can show that for any prime power q 
and any n > 2, there exists a (2, k, t;)-Steiner system with k = q and v = q" [52]. The corresponding 
ETFs consist of q"{l + ^px") vectors in g"^^ ( "^^^-^ )-dimensional space. Like the preceding four 
classes of Steiner ETFs, these frames can grow arbitrarily large: fixing any prime power q, one may 
manipulate n to produce ETFs of varying orders of magnitude. However, unlike the four preceding 
classes, these afhne Steiner ETFs also provide great flexibility in choosing redundancy. That is, 
they provide the ability to pick M and N somewhat independently. Indeed, the redundancy of such 
frames q{l + ) is essentially q, which may be an arbitrary prime power. Moreover, as these 
frames grow large, they also become increasingly sparse: their density is . Because of their high 
sparsity and flexibility with regards to size and redundancy, these frames, along with their projective 
geometry-based cousins detailed below, are perhaps the best known candidates for use in ETF-based 
applications. Such ETFs can be real if there exists a real Hadamard matrix of size 1 + such 
as whenever g = 2, or when q = 5 and n = 3. 
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Projective geometries: (2,g + 1, ^ ^ )-Steiner ETFs for any prime power q, n>2. 

With finite geometry, one can show that for any prime power q and any n > 2, there exists a 
(2, k, u)-Steiner system with k = q + 1 and v — g'lT^ [^^]- QuaUtatively speaking, the ETFs that 
these projective geometries generate share much in common with their aflBnely generated cousins, 
possessing very high sparsity and great flexibihty with respect to size and redundancy. The technical 
details are as follows: they consist of ^ q-i^ (1 + ''q-i ) vectors in 1)2 ^^ -dimensional space, 

2_i _1 

with density gn+i_i and redundancy 1)(1 + gn_i )- These frames can be real if there exists 
a real Hadamard matrix of size 1 + ^^Er'' note this restriction is identical to the one for ETFs 
generated by afBne geometries for the same q and n, implying that real Steiner ETFs generated by 
finite geometries always come in pairs, such as the 6 x 16 and 7 x 28 ETFs generated when q = 2, 
n = 2, and the 28 x 64 and 35 x 120 ETFs generated when g = 2, n = 3. 

Unitals: (2, q+ l,q'^ + 1)-Steiner ETFs for any prime povirer q. 

For any prime power q, one can show that there exists a (2, k, v)-Steiner system with k = q + 1 
and V = q^ + 1 [52]. Though one may pick a redundancy of one's liking, such a choice confines 
one to ETFs of a given size: they consist of (g^ + l){q^ + 1) vectors in ^-^j^i^^ -dimensional space, 
having redundancy {q + l){l + ^) and density These ETFs can never be real: the requisite 

Hadamard matrices are of size q^ + 1 which is never divisible by 4 since and 1 are the only squares 
in Z4. 

Denniston designs: (2, 2'', 2'^+^ + 2'^ - 2^)-Steiner ETFs for any 2<r < s. 

For any 2 < r < s, one can show that there exists a (2, /c, t;)-Steiner system with k = 2^ and 
V = 2''+* -|- 2*^ — 2* [52]. By manipulating r and s, one can independently determine the order 
of magnitude of redundancy and size: the corresponding ETFs consist of (2* -|- 2) (2''+* + 2*^ — 2*) 
vectors in "'"^^^^^"'"^ ~^ ^ -dimensional space, having redundancy 2'"P±y and density 2r+s^2'--2° • 
As such, this family has some qualitative similarities to the familes of ETFs produced by affine and 
projective geometries. However, unlike those families, the ETFs produced by Denniston designs can 
never be real: the requisite Hadamard matrices are of size 2* -|- 2, which is never divisible by 4. 

1.3.2 Conditions for the existence of Steiner equiangular tight frames 

(2, k, t;)-Steiner systems have been actively studied for over a century, with many celebrated results. 
Nevertheless, much about these systems is still unknown. In this subsection, we discuss some known 



20 



Name 



M 



N 



v{v — T) 
2 

f(f-l) 
6 

12 

vlv-1) 
20 



Redundancy 



Real? 


Restrictions 




V 


None 




v + l 
2 


u = 1, 3 mod 6 




Never 


u = 1, 4 mod 12 




v+3 
4 


u = 1,5 mod 20 




1 _L 


prime power q, n 


> 2 




prime power q, n 


> 2 


Never 


prime power q 




Never 


2 <r < s 





2- blocks 

3- blocks 

4- blocks 

5- blocks 
Affine 
Projective 
Unitals 
Denniston 



(<, + l)(,_l)2 



9 + 1 



"(t'+l) 

2 
3 
4 

(<z'+i)(<z^+i) 



2^ 

9(1+;^) 
(9 + l)(l+i) 



(2»+i)(2'-+3+2'--2^) (2°+2)(2'-+°+2'--2°) 2'-|^ 



Table 1.1: Eight infinite families of Steiner ETFs, each arising from a known infinite family of 
(2, fc, w)-Steiner designs. Each family permits both M and N to grow very largo, but only a few 
families — affine, projective and Denniston — give one the freedom to simultaneously control the pro- 
portion between M and TV, namely the redundancy ^ of the ETF. The column denoted "Real?" 
indicates the size for which a real Hadamard matrix must exist in order for the resulting ETF to be 
real; it suffices to have this size be a power of 2; if the Hadamard conjecture is true, it would suffice 
for this number to be divisible by 4. 



partial characterizations of the Steiner systems which lie outside of the eight families we have already 
discussed, as well as what these results tell us about the existence of certain ETFs. To begin, recall 
that, for a given k and v, if a (2, k, u)-Steiner system exists, then the number r of blocks that contain 
a given point is necessarily , while the total number of blocks b is . As such, in order for 

a (2, k, w)-Steiner system to exist, it is necessary for (fc, v) to be admissible, that is, to have the 
property that |5i and are integers. 

However, this property is not sufficient for existence: it is known that a (2, 6, 16)-Steiner system 
does not exist [1] despite the fact that jrEj = 3 and = 8. In fact, letting v be cither 16, 21, 36, 

or 46 resiilts in an admissible pair with fc = 6, despite the fact that none of the corresponding Steiner 
systems exist; there are twenty-nine additional values of v which form an admissible pair with fc = 6 
and for which the existence of a corresponding Steiner system remains an open problem [1]. Similar 
nastiness arises with k>7. The good news is that admissibility, though not sufficient for existence, 
is, in fact, asymptotically sufficient: for any fixed k, there exists a corresponding admissible index 
vo{k) for which for all v > vo{k) such that and are integers, a (2, fc, w)-Steiner system 

indeed exists [1]. Moreover, explicit values of vo{k) are known for small fc: vo{Q) = 801, vo{7) = 2605, 
vo{8) = 3753, vq{9) = 16497. We now detail the ramifications of these design-theoretic results on 
frame theory: 

Theorem 8. If an M x N Steiner equiangular tight frame exists, then letting a = (w^z^)'; the 
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M 


N 


k 


V 


r 


M/C 


Construction of the Steiner system 




6 


16 


2 


4 


3 


R 


2-blocks of u = 4; AfRnc with q = 2, n = 2 




7 


28 


3 


7 


3 


M 


3-blocks oi V = 7; Projective with q = 2, n = 


= 2 


28 


64 


2 


8 


7 


M 


2-blocks oi V = 8; Affine with q = 2, n = 3 




35 


120 


3 


15 


7 


M 


3-blocks of f = 15; Projective with q = 2, n 


= 3 


66 


144 


2 


12 


11 


M 


2-blocks of w = 12 




99 


540 


5 


45 


11 


R 


5-blocks of u = 45 




3 


9 


2 


3 


2 


C 


2-blocks of u = 3 




10 


25 


2 


5 


4 


C 


2-blocks of w = 5 




12 


45 


3 


9 


4 


c 


3-blocks oi V = 9; Affine with q = 3, n = 2 




13 


65 


4 


13 


4 


c 


4-blocks of f = 13; Projective with g = 3, n 


= 2 


15 


36 


2 


6 


5 


c 


2-blocks oi V = 6 




20 


96 


4 


16 


5 


c 


4-blocks oi V = 16; Affine with <; = 4, n = 2 




21 


49 


2 


7 


6 


c 


2-blocks of f = 7 




21 


126 


5 


21 


5 


c 


5-blocks of w = 21; Projective with q — A, n 


= 2 


26 


91 


3 


13 


6 


c 


3-blocks of u = 13 




30 


175 


5 


25 


6 


c 


5-blocks oi V = 25; Affine with q = 5, n = 2 




31 


217 


6 


31 


6 


c 


Projective with q = 5, n = 2 




36 


81 


2 


9 


8 


c 


2-blocks oi V = 9 




45 


100 


2 


10 


9 


c 


2-blocks of i; = 10 




50 


225 


4 


25 


8 


c 


4-blocks of w = 25 




55 


121 


2 


11 


10 


c 


2-blocks of u = 11 




56 


441 


7 


49 


8 


c 


Affine with q = 7, n = 2 




57 


190 


3 


19 


9 


c 


3-blocks of w = 19 




57 


513 


8 


57 


8 


c 


Projective with q = 7, n — 2 




63 


280 


4 


28 


9 


c 


Unital with q = 3; Denniston with r = 2, s - 


= 3 


70 


231 


3 


21 


10 


c 


3-blocks of w = 21 




72 


640 


8 


64 


9 


c 


Affine with g = 8, n = 2 




73 


730 


9 


73 


9 


c 


Projective with q = 8, n = 2 




78 


169 


2 


13 


12 


c 


2-blocks of f = 13 




82 


451 


5 


41 


19 


c 


5-blocks oi V = 41 




90 


891 


9 


81 


10 


c 


Affine with q = 9, n = 2 




91 


196 


2 


14 


13 


c 


2-blocks of u = 14 




91 


1001 


10 


91 


10 


c 


Projective with q = 9, n = 2 




100 


325 


3 


25 


12 


c 


3-blocks of u = 25 





Table 1.2: The ETFs of dimension 100 or less that can be constructed by applying Theorem 7 to 
the eight infinite families of Steiner systems detailed in Section 1.3. That is, these ETFs represent 
the first few examples of the general constructions summarized in Table 1.1. For each ETF, we give 
the dimension M of the underlying space, the number of frame vectors N, as well as the number 
k of elements that lie in any block of a i>-element set in the corresponding (2, fc, u)-Steiner system. 
We further give the value r oi the number of blocks that contain a given point; by Theorem 8, 
I (/„,/„') I = ^ measures the angle between any two frame elements. We also indicate whether the 
given frame is real or complex, and the method(s) of constructing the corresponding Steiner system. 
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corresponding block design has parameters: 



v=^, b = M, r=i, k = s ■ 

In particular, if such a frame exists, then these expressions for v, k and r are necessarily integers. 

Conversely, for any fixed k > 2, there exists an index wo(fc) for which for all v > Vo{k) such that 
|5i o.'f^d are integers, there exists a Steiner equiangular tight frame of v{l + vectors for 

a space of dimension . 

In particular, for any fixed k>2, letting v be either jk{k — 1) + 1 or jk{k — 1) + k for increasingly 
large values of j results in a sequence of Steiner equiangular tight frames whose redundancy is 
asymptotically k; these frames can be real if there exist real Hadamard matrices of sizes jk + 1 or 
jk + 2, respectively. 

Proof. To prove the necessary conditions on M and A'', recall that Steiner ETFs, namely those ETFs 
produced by Theorem 7, have N = v{l + ^Ej) and M = . Together, these two equations imply 

A'' = v + kM. Solving for k and substituting the resulting expression into A'' = v{l + ^Ej) yields the 
quadratic equation = {M — + 2(A^ — M)v — N{N — M). With some algebra, the only positive 
root of this equation can be found to be v = as claimed. Substituting this expression for v into 
N ^ V + kM yields k = j^jy+o)- Having v and fc, the previously mentioned relations bk = vr and 
V — 1 = r{k — 1) imply r ~ Irry = ^ and b = ^r = M, as claimed. 

The second set of conclusions is the result of applying Theorem 7 to the aforementioned (2, k, v)- 
Steiner ETFs that are guaranteed to exist for all sufBciently large v, provided |5y and are 
integers. The final set of conclusions are then obtained by applying this fact in the special cases 
where v is either jk{k — 1) + 1 or jk{k — 1) + k. In particular, if v = jk{k — 1) + 1 then = jk 
and M = = j{jk{k - 1) + l) are integers, and the resulting ETF of {jk + l){jk{k - 1) + l) 

vectors has a redundancy of A; + 4 that tends to k for large j; such an ETF can be real if there exists 
a real Hadamard matrix of size jk + 1. Meanwhile, if v = jk{k — 1) + fc then |5i = jk + 1 and 
M = = (jk + l){j{k - 1) + 1) are integers, and the resulting ETF of k{jk + 2){j{k - 1) + l) 

vectors has a redundancy of fc^fq^ that tends to k for large j; such an ETF can be real if there 
exists a real Hadamard matrix of size jk + 2. □ 

We conclude this section with a few thoughts on Theorems 7 and 8. First, we emphasize that 
the method of Theorem 7 is a method for constructing some ETFs, and by no means constructs 
them all. Indeed, as noted above, the redundancy of Steiner ETFs is always strictly greater than 
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2; while some of those ETFs with < 2 will be the Naimark complements of Steiner ETFs, one 
must admit that the Steiner method contributes little towards the understanding of those ETFs with 
^ — 2, such as those arising from Paley graphs [141]. Moreover, Theorem 8 implies that not even 
every ETF with > 2 arises from a Steiner system: though there exists an ETF of 76-elements in 

M}'^ [141], the corresponding parameters of the design would be u = ^, r = 5 and fc = ^, not all of 
which are integers. 

That said, the method of Theorem 7 is truly significant: comparing Table 1.2 with a compre- 
hensive list of all real ETFs of dimension 50 or less [141], we see the Steiner method produces 4 of 
the 17 ETFs that have redundancy greater than 2, namely 6 x 16, 7 x 28, 28 x 64 and 35 x 120 
ETFs. Interestingly, an additional 4 of these 17 ETFs can also be produced by the Steiner method, 
but only in complex form, namely those of 15 x 36, 20 x 96, 21 x 126 and 45 x 100 dimensions; 
it is unknown whether this is the result of a deficit in our analysis or the true non-existence of 
real-valued Steiner-based constructions of these sizes. The plot further thickens when one realizes 
that an additional 2 of these 17 real ETFs satisfy the necessary conditions of Theorem 8, but that 
the corresponding (2, fc,w)-Steiner systems are known to not exist: if a 28 x 288 ETF was to arise 
as a result of Theorem 7. the corresponding Steiner system would have fc = 6 and v — 36, while the 
43 x 344 ETF would have fc = 7 and v = 43; in fact, (2, 6, 36)- and (2, 7, 43)-Steiner systems cannot 
exist [1]. With our limited knowledge of the rich literature on Steiner systems, we were unable to 
resolve the existence of two remaining candidates: 23 x 276 and 46 x 736 ETFs could potentially 
arise from (2, 10, 46)- and (2, 14, 92)-Steiner systems, respectively, provided they exist. 

1.4 Restricted isometry and digital fingerprinting 

In the previous section, we used Theorem 7 to construct many examples of Steiner ETFs. In this 
section, we investigate the feasibility of using such frames for applications in sparse signal process- 
ing. Regarding restricted isometry, one of the sad consequences of the Steiner construction method 
in Theorem 7 is that we now know there is a large class of ETFs for which the seemingly coarse 
estimate from the Gershgorin analysis (1.4) is, in fact, accurate. In particular, recall that Gershgorin 
guarantees that every M x N ETF is (K, 5)-RIP whenever K < 8\fM. Furthermore, recall from 
Theorem 7 that every Steiner ETF is built by carefully overlapping v regular simplices, each consist- 
ing of r -h 1 vectors in an r-dimensional subspace of 6-dimensional space. Thus, the corresponding 
subcoUection of r -|- 1 vectors that lie in a given block are linearly dependent. Considering the value 
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of r given in Theorem 8, we see that Steiner ETFs $ have 

where the last inequality uses the fact that Steiner ETFs have redundancy ^ > 2. Therefore, 
Steiner ETFs are not {K, 1 — e)-RIP for any K > y/2M, that is, they fail to break the square-root 
bottleneck. This begs the open question: Are there any ETFs which are as RIP as random matrices, 
or does being optimal in the Gershgorin sense necessarily come at the cost of being able to support 
large sparsity levels? In Chapter 3, we address this problem directly and make some interesting 
connections with graph theory and number theory, but we do not give a conclusive answer. 

Despite their provably suboptimal performance as RIP matrices, we will see that Steiner ETFs 
are particularly well-suited for the application of digital fingerprints. Digital media protection has 
become an important issue in recent years, as illegal distribution of licensed material has become 
increasingly prevalent. A number of methods have been proposed to restrict illegal distribution 
of media and ensure only licensed users are able to access it. One method involves cryptographic 
techniques, which encrypt the media before distribution. By doing this, only the users with appro- 
priate licensed hardware or software have access: satellite TV and DVDs are two such examples. 
Unfortunately, cryptographic approaches are limited in that once the content is decrypted (legally 
or illegally), it can potentially be copied and distributed freely. 

An alternate approach involves marking each copy of the media with a unique signature. The 
signature could be a change in the bit sequence of the digital file or some noise-like distortion of the 
media. The unique signatures are called fingerprints, by analogy to the uniqueness of human finger- 
prints. With this approach, a licensed user could illegally distribute the file, only to be implicated 
by his fingerprint. The potential for prosecution acts as a deterrent to unauthorized distribution. 
However, fingerprinting systems are vulnerable when multiple users form a collusion by combining 
their copies to create a forged copy. This attack can reduce and distort the coUuders' individual fin- 
gerprints, making identification of any particular user difficult. Some examples of potential attacks 
involve comparing the bit sequences of different copies, averaging copies in the signal space, as well 
as introducing noise, rotations, or cropping. 

One of the principal approaches to designing fingerprints with robustness to collusions uses 
what is called the distortion assumption. In this regime, fingerprints are noise-like distortions to 
the media in signal space. In order to preserve the overall quality of the media, limits are placed 
on the magnitude of this distortion. The content owner limits the power of the fingerprint he 
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adds, and the collusion limits the power of the noise they add in their attack. When applying 
the distortion assumption, the literature typically assumes that the collusion linearly averages their 
individual copies to forge the host signal. Also, while results using the distortion assumption tend 
to accommodate fewer users than those with other assumptions, this assumption is distinguished by 
its natural embedding of fingerprints, namely in the signal space. 

Cox et al. introduced one of the first robust fingerprint designs under the distortion assump- 
tion [54]; the robustness was later analytically proven in [92]. Different fingerprint designs have 
since been studied, including orthogonal fingerprints [142] and simplex fingerprints [94]. We propose 
ETFs as a fingerprint design under the distortion assumption, and we analyze their performance 
against the worst-case collusion [103, 104]. Using analysis from Ergun et al. [66], we will show that 
ETFs perform particularly well as fingerprints; as a matter of fact, Steiner ETF fingerprints perform 
comparably to orthogonal and simplex fingerprints on average, while accommodating several times 
as many users [104] . We start by formally presenting the fingerprinting and collusion processes. 

1.4.1 Problem setup 

A content owner has a host signal that he wishes to share, but he wants to mark it with fingerprints 
before distributing it. We view this host signal as a vector s G M.^ , and the marked versions of this 
vector will be given to N > M users. Specifically, the nth user is given 

where e denotes the nth fingerprint; we assume the fingerprints have equal norm. We 
wish to design the fingerprints {<fin}n=i *o ^e robust to a linear averaging attack. In particular, let 
/C C {1, . . . , N} denote a collection of users who together make a diff'erent copy of the host signal. 
Then their linear averaging attack produces a forgery: 

f:=^XkSk + z, ^Xk = l, Xk>0 Vfc, (1.7) 
fee/c fee/c 

where z is a noise vector introduced by the coUuders. This attack model is illustrated in Figure 1.1. 

Certainly, the ultimate goal of the content owner is to detect every member of the forgery 
coalition. This can prove difficult in practice, though, particularly when some individuals contribute 
little to the forgery, with Xk <^ J^■ However, in the real world, if at least one colluder is caught, 
then other members could be identified through the legal process. As such, we consider focused 
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Figure 1.1: The fingerprint and forgery processes. First, the content owner makes different copies 
of his host signal s by adding fingerprints <^„ which are unknown to the users. Next, a subcollection 
/C C {1, . . . , TV} of the users collude to create a forgery / by picking a convex combination of their 
copies and adding noise z. In this example, the forgery coalition /C includes users 2, 3, and N. 

detection, where a test statistic is computed for each user, and we perform a binary hypothesis test 
to decide whether that particular user is guilty. 

Our detection procedure is as follows: With the cooperation of the content owner, the host signal 
can be subtracted from a forgery to isolate the fingerprint combination: 



y := / - s = ^ Xk(Pk + z- 
keic 



(1.8) 



To help the content owner discern who is guilty, we then use a normalized correlation function as a 
test statistic for each user n: 

{y,Vn) 



Tn{y) := 



12 • 



Having devised a test statistic, let -ffi(n) denote the guilty hypothesis (n e /C) and Ho{n) denote 
the innocent hypothesis (n /C). Then picking some correlation threshold r, we use the following 
detector: 

Hi{n), Tn{y)>T, 



Drin) := 



Ho{n), Tn{y) < r. 



(1.9) 
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To determine the effectiveness of our fingerprint design and focused detector, we will investigate 
the corresponding error probabilities, but first, we build our intuition for fingerprint design using a 
certain geometric figure of merit. 

1.4.2 A geometric figure of merit for fingerprint design 

For each user n, consider the distance between forgeries deriving from two types of potential collu- 
sions: those of which n is a member, and those of which n is not. Intuitively, if every fingerprint 
combination involving n is distant from every combination not involving n, then even with moderate 
noise, there should be little ambiguity as to whether the nth user was involved. To make this precise, 
for each user n, we define the "guilty" and "not guilty" sets of noiseless fingerprint combinations: 



In words, QK,n is the set of size-ii' fingerprint combinations of equal weights which include n, while 
~'OK,n is the set of combinations which do not include n. Note that in our setup (1.7), the weights 
Xk were arbitrary values which sum to 1. We will show in Theorem 11 that the best attack from the 

collusion's perspective \ises equal weights so that no single coUuder is particularly vulnerable. From 
this perspective, it makes sense to bound the distance between these two sets: 



Note that by taking <!> to be the M x N matrix whose columns are the fingerprints (pn, the 
fingerprint combination (1.8) can be rewritten as y = + z, where the entries of when 
k G )C and zero otherwise. Thus, if the matrix of fingerprints $ is {K, 5)-BIP with 6 < \/2 — 1, then 
we can recover the /^-sparse vector x using Theorem 2. However, the error in the estimate x of x 
will be on the order of 10 times the size of the noise z [34] . Due to the potential legal ramifications of 
false accusations, this order of error is not tolerable. Note that the methods of compressed sensing 
recover the entire vector x, the support of which identifies the entire collusion. By contrast, we 
will investigate RIP matrices for fingerprint design, but to minimize false accusations, we will use 
focused detection (1.9) to identify colluders. 

We now investigate how well RIP matrices perform with respect to our geometric figure of merit. 




dist{gK,n, -'GK,n) ■= min{||y -y'\\2-y& GK,n, y' G ^GK,n}- 



(1.10) 
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Without loss of generality, we assume the fingerprints are unit norm; since they have equal norm, 
the fingerprint combination can be scaled by before the detection phase. With this in mind, 
we have the following a lower bound on the distance (1.10) between the "guilty" and "not guilty" 
sets corresponding to any user n: 



Theorem 9. Suppose fingerprints $ = [<pi • • • (^at] have restricted isometry constant 52k- Then 

6:\St{QK,n,^QK,n) > 



1-5. 



2K 



K{K-l)' 



(1.11) 



Proof. Take /C, /C' C {1, . . . , N} such that |/C|, \K'\ < and n e K\K' . Then the left-hand inequality 
of the restricted isometry property gives 



l/CI l/C'l 5 



2 








V|/C| 


\IC'\J 



nelCnlC' ' ' n£K\K' ' ' neK'\K 



>n ^ ^/"irnr'ir 1 i \^ , |/C\r| |r\/C| 

>(1- W'|)(^l^n/C|^^-^j +^^ + ^p7^ 



l-<5, 



|/cuk:'| 



|/C||/C'| 

For a fixed |/C|, we will find a lower bound for 



|/C| |/C'| 

|/c| + |/c'| -2|/cn/c' 



|/C'| 



(1.12) 



^(,^,.r,-2|^nr,)=iH-t|^ 



(1.13) 



Since we can have |/C fl /C'| > we know ^ < when (1.13) is minimized. That said. 



:'|>f 

|/C'| must be as small as possible, i.e., |/C'| = |/C n/C'|. Thus, when (1.13) is minimized, we have 

l/C 



\Kr\K' 



1, 



p^(|/C| + |/C'|-2|/Cn/C'|) 

i.e., |/C n /C'l must be as large as possible. Since n e /C \ /C', we have |/C fl /C'| < |/C| — 1. Therefore, 

(1.14) 

Substituting (1.14) into (1.12) gives 



i^(|q + r|-2|rnr|)>|^. 



> 



l-6i 



\KUK'\ ^ 1 - S2K 



|/C|(|/C|-1) - KiK-iy 
Since this bound holds for every n, K and K' with n e /C \ /C', we have (1.11). 



□ 
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Combining Theorem 9 with the Gershgorin estimate 52k < — in terms of worst-case 
coherence ^, yields the following: 

Corollary 10. Suppose fingerprints $ = [(pi • • • </5jv] are unit-norm with worst-case coherence ji. 



In words, Corrolary 10 says that less coherent fingerprints provide a greater distance between the 
"guilty" and "not guilty" sets. It is therefore fitting to consider minimizers of worst-case coherence, 
namely equiangular tight frames. One type of ETF has already been proposed for fingerprint design: 
the simplex [94]. The simplex is an ETF with N = M-\-l and jj. In fact, [94] gives a derivation 
for the exact value of the distance (1.10) in this case: 



The bound (1.15) is lower than (1.16) by a factor of y 1 — j^^, and for practical cases in which 
K <^ M, the two are particularly close. Overall, ETF fingerprint design is a natural generalization 
of the provably optimal simplex design of [94]. 

Having applied the Gershgorin analysis to illustrate how ETF fingerprints perform with respect 
to our geometric figure of merit, we have yet to establish any fingerprint-specific consequences of 
Steiner ETFs not being as RIP as random matrices. Certainly, whether K scales as y/M or M is 
an important distinction in the compressed sensing community, but interestingly, in the context of 
fingerprints, this diff'erence offers no advantage. To be clear, Ergun et al. [66] showed that for any 
fingerprinting system, there is a tradeoff between the probabilities of successful detection and false 
positives imposed by a linear-average-plus-noise attack from sufficiently large collusions. Specifically, 
a collusion of size K = ^^{^ xJ^) is sufficient to overcome the fingerprints, as the detector will 
not be able to identify any attacker without incurring a false-alarm probability that is too large to 
be admissible in court. This constraint is more restrictive than the coherence-based reconstruction 
guarantees which require K = 0{-/M), and so from this perspective, random RIP constructions are 
no better for fingerprint design than deterministic constructions. 

1.4.3 Error analysis 

We now investigate the errors associated with using ETF fingerprints and a focused correlation 
detector with linear-average-plus-noise attacks. To do this, we assume that the noise z included 



Then 




(1.15) 




(1.16) 




30 



in the attack (1.7) has independent Gaussian entries of mean zero and variance a^. One type of 
error we can expect is the false-positive error, in which an innocent user n ^ /C is found guilty 
{Tn{y) > t). This could have significant ramifications in legal proceedings, so this error probability 
Pr[T„(2/) > r|i?o(«)] should be kept extremely low. To ensure this typo of error is improbable, 

we consider the worst-case type I error probability, which depends on the fingerprint design the 
correlation threshold r, and the weights {xk}k=i used by the coUuders in their linear average: 

Pi($,T,{xfe}f=i) := max max max Pr [T„ (y) > r|ifo(n)]. (1.17) 

/CC{l,...,A'} /C— >{a;fc} ng/C 
|K!|=-R" bijective 

In words, the probability that an innocent user n is found guilty is no larger than Pi($, r, {xk}k=\)-> 
regardless of the coalition K or how the coalition members assign weights from {xfe}^^. The other 
error type is the false-negative error, in which a guilty user n € /C is found innocent (T'„(y) < r). 
In this case, since the goal of our detection is to catch at least one of the colluders, we define the 

worst-case type II error probability as follows: 

P„(*,r,{xfe}f=i) := max max minPr[r„(y) < T\H,{n)\. (1.18) 

/CC{1,...,JV} /C->{xfe} ne/C 
|/C|=K' bijective 

This way, regardless of who the colluders are or how they assign the weights, at least one of the 
colluders will have a false-negative probability less than Pii(^, t, {0;^}^^), meaning even in the 
worst-case scenario, we can correctly identify one of the colluders with probability > 1 — Pn. 

Theorem 11. Take fingerprints as the columns of an M x N matrix $ = [(pi •••(/? at], which, when 
normalized by the fingerprints ' common norm 7, forms an equiangular tight frame. If the noise z 
included in the attack (1.7) has independent Gaussian entries of mean zero and variance cr^, then 
the worst-case type I and type II error probabilities, (1.17) and (1.18), satisfy 

Pl($,T,K}f=l)<Q(^^(T-M)), 

Pn($, T, {Xfc}f=i) <q{^(P- + p) max{xft}f=i " " ^)) ' 
where Q{x) := ^ e-'^'/^du and n = ^J m(n-i) - 

Proof. To bound Pi(^,t, {ajfe}^^), assume a given user n is innocent, i.e., Ho{n). Then the test 
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statistic for our detector (1.9) is given by 



Tn{y) = Xkfk + Z,ip„\ = '^Xk 



keic 



keic 



+ -{ Z,T, 



\\¥>k\\ IWr, 



7 \ \{<Pn 



By the symmetry of z's Gaussian distribution, we know the projection {z, also has Gaus- 

sian distribution with mean zero and variance a^, meaning our test statistic T„(t/) has Gaussian 
distribution with mean J2keK-'^k{j^\, Wll"^ ^^'^ variance Furthermore, since the normalized 
fingerprints form an ETF with worst-case coherence /x, we can use the triangle inequality to bound 
the mean of Tn{y): 



keic 



WkW \\Vn\ 
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keic 



^k 
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keic 



^k 



WkW \Wn\ 



= M- 



We use this to bound the false-positive probability for user n: 

Pr[r„(2/) > r|ifo(n)] = q(^^ (r - E[r„(2/)|//o(n)])) <q(^(t-m))- 

Since this bound holds for all coalitions, weight assignments and innocent users, this bound must 
also hold for Pi($, r, {arfel^Li)- 

Next, to bound Pii($,r, {xk}k=\)-> assume a given user n is guilty, i.e., H\{n). In this case, the 
test statistic for our detector (1.9) is given by 



m / \ 1 / \ ^ \ \ ^ / 'Pfe V>n \ 1 / 

Tn{y) = 2^XkiPk + Z,ipn) =Xn + Xk{ jmr ' TT— [T ) + t{z, 

^ keic ' keic 

k^n 



fkW \Wn\\/ 7\ \Wn\ 



As before, T„(y) has Gaussian distribution with variance but this time, the mean is 



''71 "I" ^ ^ Xk 



keic 

k^n 



^k 



— x^ 



keic 
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^ / fk <fn 
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>Xn-IJi^Xk = {l + ll)Xn - II. 



keic 

k^n 



As such, the false-negative probability for user n is 



Pr[r„(2/) < T\H,{n)] =q(^-1(t- E[T„(y)|iJi(n)])^ < + f^)xn " M " ^)) • 
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Applying the definition of Pii($, r, {xk}^^i) therefore gives 



Pii($,r,{a;,}f=i) 




-XI-,.. .,11 s '^—r\.^kS 

\K\=K bijective 



< 





□ 



From Theorem 11, we can glean a few interesting insights about ETF fingerprints. First, the up- 
per bound on Pi($, r, {xk}k=\) is independent of {xk}k=ii indicating that the coalition cannot pick 
weights in a way that frames an innocent user. Additionally, the upper bound on Pii($, r, {xk}k=i) 
is maximized when the weights Xk are equal, corresponding to our use of equal weights in the ge- 
ometric figure of merit. This confirms our intiiition that the coalition has the best chance of not 
being caught if no member is particularly vulnerable. 
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Chapter 2 



Full spark frames 



In the previous chapter, we reviewed how to use the Gershgorin circle theorem to demonstrate 
the restricted isometry property (RIP), and how identifying small spark disproves RIP. We then 
showed that Steiner equiangular tight frames (ETFs) are optimal in the Gershgorin sense, but have 
particularly small spark. Among other things, this illustrates that the "square-root bottleneck" 
with deterministic RIP matrices is not merely an artifact of the Gershgorin analysis. That said, 
as an intermediate goal to constructing RIP matrices, we seek deterministic matrices with large 
spark, understanding that RIP matrices necessarily have this property. To this end, one is naturally 
led to consider full spark matrices, that is, M x N matrices $ with the largest spark possible: 
Spark($) = M + 1. Equivalently, M x N full spark matrices have the property that every M x M 
submatrix is invertible; as such, a full spark matrix is necessarily full rank, and therefore a frame. 

Interestingly, in sparse signal processing, the specific application of full spark frames has already 
been studied for some time. In 1997, Gorodnitsky and Rao [74] first considered full spark frames, 
referring to them as matrices with the unique representation property. Since [74], the unique rep- 
resentation property has been explicitly used to find a variety of performance guarantees for sparse 
signal processing [30, 105, 144]. Tang and Nehorai [133] also obtain performance guarantees using 
full spark frames, but they refer to them as non-degenerate measurement matrices. 

For another application of full spark frames, we consider the problem of reconstructing a signal 
from distorted frame coefficients. Specifically, we observe a scenario in which frame coefficients 
{{^* x)[n]}^^i are transmitted over a noisy or lossy channel before reconstructing the signal: 

y = V{^*x), i = ($$*)-^$y, (2.1) 
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where ■D(-) represents the channel's random and not-necessarily-hnear deformation process. Using 
an additive white Gaussian noise model, Goyal [75] established that, of all unit norm frames, unit 
norm tight frames minimize mean squared error in reconstruction. For the case of a lossy channel. 
Holmes and Paulsen [84] established that, of all tight frames, unit norm tight frames minimize 
worst-case error in reconstruction after one erasure, and that equiangular tight frames minimize this 
error after two erasures. We note that the reconstruction process in (2.1), namely the application 
of ($i>*)~^<I>, is inherently blind to the effect of the deformation process of the channel. This 
contrasts with Piischel and Kovacevic's more recent work [113], which describes an adaptive process 
for reconstruction after multitudes of erasures. In this context, they reconstruct the signal after 
first identifying which frame coefficients were not erased; with this information, the signal can be 
estimated provided the corresponding frame elements span. In this sense, full spark frames are 
maximally robust to erasures, as coined in [113]. In particular, a,n M x N full spark frame is robust 
to N — M erasures since any M of the frame coefficients will uniquely determine the original signal. 

Yet another application of full spark frames is phaseless reconstruction, which can be viewed in 
terms of a channel, as in (2.1); in this case, !?(•) is the entrywise absolute value function. Phase- 
less reconstruction has a number of real- world applications including speech processing [15], X-ray 
crystallography [37], and quantum state estimation [116]. As such, there has been a lot of work 
to reconstruct an M-dimensional vector (up to an overall phase factor) from the magnitudes of 
its frame coefficients, most of which involves frames in operator space, which inherently require 
N = f2(M^) measurements [14, 116]. However, Balan et al. [15] show that if an M x A'' real frame 
$ is full spark with N > 2M — 1, then D o $* is injective, meaning an inversion process is possible 
with only N = 0(M) measurements. This result prompted an ongoing search for efficient phaseless 
reconstruction processes [13, 37], but no reconstruction process can succeed without a good family 
of frames, such as full spark frames. 

Despite the fact that full spark frames have a multitude of applications, to date, there has not been 
much progress in constructing deterministic full spark frames, let alone full spark frames with addi- 
tional desirable properties. A noteworthy exception is Piischel and Kovacevic's work [113], in which 
real full spark tight frames are constructed using polynomial transforms. In the present chapter, we 
start by investigating Vandermonde frames, harmonic frames, and modifications thereof [2]. While 
the use of certain Vandermonde and harmonic frames as full spark frames is not new [30, 36, 72], 
the fruits of our investigation are new: For instance, we demonstrate that certain classes of ETFs 
are full spark, and we characterize the M x N full spark harmonic frames for which A is a prime 
power. Later, we prove that verifying whether a matrix is full spark is hard for NP under randomized 
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polynomial-time reductions [2]. In other words, assuming NP % BPP (a computational complexity 
assumption slightly stronger than P ^ NP and nearly as widely believed), then there is no method 
by which one can efficiently test whether matrices are full spark. As such, the deterministic con- 
structions we provide are significant in that they guarantee a property which is otherwise difficult 
to check. Wc conchidc the chapter by introducing a new technique for efficient phaseless recovery, 
which explicitly makes use of deterministic full spark frames to design N = 0(M) measurements. 



2.1 Deterministic constructions of full spark frames 

A square matrix is invertible if and only if its determinant is nonzero, and in our quest for determinis- 
tic constructions of full spark frames, this characterization will reign supreme. One class of matrices 
has a particularly simple determinant formula: Vandermonde matrices. Specifically, Vandermonde 
matrices have the following form: 



V = 



1 

ai 



a 



1 

Q!2 



M-1 „M-1 



1 

M-1 



(2.2) 



and square Vandermonde matrices, i.e., with N = M, have the following determinant: 



det{V) = Y[ ('^j - 

l<i<j<M 



(2.3) 



Consider (2.2) in the case where N > M. Since every MxM submatrix of V is also Vandermonde, we 
can modify the indices in (2.3) to calculate the determinant of the submatrices. These determinants 
are nonzero precisely when the bases {an}n=i are distinct, yielding the following result: 

Lemma 12. A Vandermonde matrix is full spark if and only if its bases are distinct. 

To be clear, this result is not new. In fact, the full spark of Vandermonde matrices was ffist 
exploited by Fuchs [72] for sparse signal processing. Later, Bourguignon et al. [30] specifically used 
the full spark of Vandermonde matrices whose bases are sampled from the complex unit circle. 
Interestingly, when viewed in terms of frame theory, Vandermonde matrices naturally point to the 
discrete Fourier transform: 



Theorem 13. The only M x N Vandermonde matrices that are equal norm and tight have bases in 
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the complex unit circle. Among these, the frames with the smallest worst-case coherence have bases 
that are equally spaced in the complex unit circle, provided N > 2M. 

Proof. Suppose a Vandermonde matrix is equal norm and tight. Note that a zero base will produce 
the zeroth identity basis element Sq. Letting P denote the indices of the nonzero bases, the fact that 
the matrix is full rank implies IV] > M — 1. Also, equal norm gives that the frame element length 

M-1 M-1 M-1 

m=0 m=0 m=0 

is constant over n gV. Since J2m=o ^^"^ strictly increasing over < a; < oo, there exists c > 
such that = c for all n € V. Next, tightness gives that the rows have equal norm, implying 

that the first two rows have equal norm, i.e., \V\c = \'P\c?. Thus c — 1, and so the nonzero bases 
are in the complex unit circle. Furthermore, since the zeroth and first rows have equal norm by 
tightness, we have \'P\ = N ^ and so every base is in the complex unit circle. 

Now consider the inner product between Vandermonde frame elements whose bases {e^'^*^"}^^i 
come from the complex unit circle: 

M-1 M-1 



e 

m=0 m— 



We will show that the worst-case coherence comes from the two closest bases. Consider the following 
function: 



g{x) := 



M-1 

2'jrixm 



m=0 



(2.4) 



Figure 2.1 gives a plot of this function in the case where M = 5. We will prove two things about 
this function: 

(i) £9{x) < for every xG{0,^), 

(ii) 9{x) < g{^) for every x € 1 - 

First, we claim that (i) and (ii) are sufficient to prove our result. To establish this, we first show 
that the two closest bases e^'^*^"' and e^"^'"^"" satisfy \xn' — Xn"\ < 2M- Without loss of generality, 
the n's are ordered in such a way that {xn}n=o — t^' 1) nondecreasing. Define 



d{Xn,Xn-^l^ . — 



Xn+l-Xn, n = 0,...,N -2 

xq- (xN-i-'i-), n = N-l, 
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Figure 2.1: Plot of g defined by (2.4) in the case where M = 5. Observe (i) that g is strictly 
decreasing on the interval (0, j^), and (ii) that g{x) < g( j^^) for every x € {j^, j^). As established 
in the proof of Theorem 13, g behaves in this manner for general values of M. 

and let n' be the n which minimizes rf(a;„,a;„+i). Since the minimum is less than the average, we 
have 

If ^"^ \ 1 1 

d{x„:,X„:+i) <—{{xq- {xn-I - 1)) + ^{Xn+l-Xn) \ ^ < ^> (2-5) 

provided N > 2M. Note that if we view {a;„}„£z^ as members of K/Z, then d{xn,Xn+i) = a;„+i— a;„. 
Since g{x) is even, then (i) implies that \{^n'+ijVn')\^ = 9{Xn'+i — Xn') is larger than any other 
g{Xp-Xp') = |((^p,(pp/)p in which a;p - Xp' e [0, 2h]U[1~ SM'^)- (^-S) and (ii) together imply 

that \{(Pn'+i,<fn')\'^ = 9(xn'+i - x„') > .9(237) larger than any other g(xp - Xp>) = \{(Pp,<fip')\'^ in 
which Xp — Xpi e (2M' 1 ^ 21?)' pi'ovided N > 2M. Combined, (i) and (ii) give that \{ipn'+iTfn')\ 
achieves the worst-case coherence of {ipn}ne2K ■ Additionally, (i) gives that the worst-case coherence 
|(<^„'+i, <Pn')| is minimized when Xn'+i — Xn> is maximized, i.e., when the a;„'s are equally spaced in 
the unit interval. 

To prove (i), note that the geometric sum formula gives 



M-l 


2 


g2M7rix _ ^ 












m=0 







^ 2- 2cos(2Af7rx) /sin(M7r2:) ^ ^ 



2 — 2 cos(27ra;) \ sin(7ra;) 



(2.6) 



where the final expression uses the identity 1 — cos(22) = 2 sin^ z. To show that g is decreasing 
over (0, 21?) J iiote that the base of (2.6) is positive on this interval, and performing the quotient 
rule to calculate its derivative will produce a fraction whose denominator is nonnegative and whose 
numerator is given by 

M7rsin(7ra;) cos(M7rx) — 7rsin(M7ra;) cos(7ra;). (2.7) 
This factor is zero at a; = and has derivative: 

-(M^ - l)7r^sin(7ra;) sin(M7ra;), 
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which is strictly negative for all x € (0, jm)- Hence, (2.7) is strictly negative whenever x e (0, i^), 
and so g'{x) < for every x € (0, jm)- 

For (ii), note that for every x € 1 — 517)' ^^'^ individually bound the numerator and 
denominator of what the geometric sum formula gives: 



M-l 
m=0 



II 



< 



-II 



IgTTi/M _ l\2 



M-l 

TO=0 



i/M 



□ 



Consider the N x N discrete Fourier transform (DFT) matrix, scaled to have entries of unit 
modulus: 

11 1 ••• 1 



1 UJ UJ^ 

1 LO^ LO* 



1 w^-i 



CO 



a;2(^-i) 

(JV-1)(JV-1) 



where a; = e^^^*/^. The first Af rows of the DFT form a Vandermondc matrix of distinct bases 
{(^"}n=ai such, this matrix is full spark by Lemma 12. In fact, the previous result says that 
this is in some sense an optimal Vandermondc frame, but this might not be the best way to pick 
rows from a DFT. Indeed, several choices of DFT rows could produce full spark frames, some with 
smaller coherence or other desirable properties, and so the remainder of this section focuses on full 
spark DFT submatrices. First, we note that not every DFT submatrix is full spark. For example, 
consider the 4x4 DFT: 

1111 
1 -j -1 t 
1-11-1 
1 i -1 -i 

Certainly, the zeroth and second rows of this matrix are not full spark, since the zeroth and second 
columns of this submatrix form the all-ones matrix, which is not invertible. So what can be said 
about the set of permissible row choices? The following result gives some necessary conditions on 
this set: 

Theorem 14. Take an N x N discrete Fourier transform matrix, and select the rows indexed by 
M. C Zat to build the matrix //$ is full spark, then so is the matrix built from rows indexed by 



(i) any translation of M., 
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(ii) any AM. with A relatively prime to N, 
(Hi) the complement of M. in Z]v- 

Proof. For (i) , we first define D to be the NxN diagonal matrix wliose diagonal entries are {u)"'}^Zq . 
Note that, since = a;"a;™", translating the row indices by 1 corresponds to multiplying 

$ on the right by D. For some set /C C Z]v of size M := let ^jc denote the M x M submatrix 

of $ whose columns arc indexed by /C, and let D/c denote the M x M diagonal submatrix of D 
whose diagonal entries are indexed by /C. Then since D/c is unitary, we have 



Thus, if <I> is full spark, \det{{^D)ic)\ = |det($K;)| > 0, and so is also full spark. Using this fact 
inductively proves (i) for all translations of A4. 

For (ii), let ^ denote the submatrix of rows indexed by AA4. Then for any /C C Zjv of size M, 



Since A is relatively prime to A'', multiplication by A permutes the elements of Zjv, and so AlC has 
exactly M distinct elements. Thus, if $ is full spark, then det(^;<;) = det($AK;) 0) aiid so is 
also full spark. 

For (iii), we let ^' be the (A'' — M) x N submatrix of rows indexed by Ad'^, so that 



We will use contraposition to show that $ being full spark implies that ^ is also full spark. To this 
end, suppose ^' is not full spark. Then has a collection oi N — M linearly dependent columns 
{V'ilieK;) and so there exists a nontrivial sequence {aijieA: such that 



Considering = ^^j, where 8i is the ith identity basis element, we can use (2.8) to express this 



|det(($D)K)| = \dei{^KDK)\ = |det($K)||det(I)K)| = |det($K;)|. 



det(f k:) = det(w(^'")'=)„eM,fceK: = det(a;'"(^'=))„eM,fceK; = det($^;c). 



NIn = $* M/* 



(2.8) 
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linear dependence in terms of 



Rearranging then gives 



(2.9) 



ieic 



ieic 



Here, we note that x is nonzero since {ailiex; is nontrivial, and that x G Range($*$). Furthermore, 
whenever j ^ K, we have from (2.9) that 



and so a; ± Span{$*$(5j}j£A:':- Thus, the containment Spanl***^^}^^^:" C Range (<&*<!>) is proper, 
and so 

M = Rank($) = Rank($*$) > Rank($*$jc'=) = Rank($K'=)- 
Since the M x M submatrix ^jcc is rank-deficient, it is not invertible, and therefore $ is not full 



We note that our proof of (iii) above uses techniques from Cahill et al. [32], and can be easily 
generalized to prove that the Naimark complement of a full spark tight frame is also full spark. 
Theorem 14 tells us quite a bit about the set of permissible choices for DFT rows. For example, not 
only can we pick the first M rows of the DFT to produce a full spark Vandermonde frame, but we 
can also pick any consecutive M rows, by Theorem 14(i). We would like to completely characterize 
the choices that produce full spark harmonic frames. The following classical result does this in the 
case where N is prime; 

Theorem 15 (Chebotarev, see [126]). Let N be prime. Then every square submatrix of the N x N 
discrete Fourier transform matrix is invertible. 

As an immediate consequence of Chebotarev's theorem, every choice of rows from the DFT 
produces a full spark harmonic frame, provided N is prime. This application of Chebotarev's 
theorem was first used by Candes et al. [36] for sparse signal processing. Note that each of these 
frames are equal-norm and tight by construction. Harmonic frames can also be designed to have 
minimal coherence; Xia et al. [146] produces harmonic equiangular tight frames by selecting row 
indices which form a difference set in Z^. Interestingly, most known families of difference sets 




= 



spark. 



□ 
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in Zjv require A'' to be prime [87], and so the corresponding harmonic equiangular tight frames are 
guaranteed to be full spark by Chebotarev's theorem. In the following, we use Chebotarev's theorem 
to demonstrate full spark for a class of frames which contains harmonic frames, namely, frames which 
arise from concatenating harmonic frames with any number of identity basis elements: 

Theorem 16 (cf. [131, Theorem 1.1]). Let N be prime, and pick any M < N rows of the N x N 
discrete Fourier transform matrix to form the harmonic frame H. Next, pick any K < M, and 
take D to he the M x M diagonal matrix whose first K diagonal entries are \J ^^mn^ > ^''^^ whose 
remaining M — K entries are \J^^^- Then concatenating DH with the first K identity basis 
elements produces an M x {N + K) full spark unit norm tight frame. 

As an example, when N = 5 and K = 1, we can pick M — ',i rows of the 5x5 DFT which are 
indexed by {0, 1,4}. In this case, D makes the entries of the first DFT row have size and the 
entries of the remaining rows have size y'g^. Concatenating with the first identity basis element then 
produces an equiangular tight frame which is full spark: 



$ = 



5 

2g-27ri/5 

2„-27ri4/5 
5^^ 



1 
5 

2 .-27ri2/5 

2^-27ri3/5 
5^ 



5 

2g-27ri3/5 



2 -,-27ri2/5 



I 1 
2g-2^i4/5 



2g-2W5 



(2.10) 



Proof of Theorem 16. Let $ denote the resulting M x [N + K) frame. We start by verifying that 
$ is unit norm. Certainly, the identity basis elements have unit norm. For the remaining frame 
elements, the modulus of each entry is determined by D, and so the norm squared of each frame 
element is 

To demonstrate that $ is tight, it suffices to show that = 1m- The rows of DH are or- 
thogonal since they are scaled rows of the DFT, while the rows of the identity portion are orthogonal 
because they have disjoint support. Thus, is diagonal. Moreover, the norm squared of each of 
the first K rows is ^( ''^"j^w^ ) + 1 = ^ while the norm squared of each of the remaining rows 
is iV(^) = and so = M/m- 

To show that i> is full spark, note that every M x M submatrix of DH is invertible since 

|det((I?iJ);c)| = |det(7^i/;c)| = |det(D)||det(iJK)| > 0, 



by Chebotarev's theorem. Also, in the case where K = M, we note that the M x M submatrix of $ 
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composed solely of identity basis elements is trivially invertible. The only remaining case to check 
is when identity basis elements and columns of DH appear in the same M x M submatrix ^jc- In 
this case, we may shuffle the rows of $x; to have the form 

A 
B Ik 

Since shuffling rows has no impact on the size of the determinant, we may further use a determinant 
identity on block matrices to get 







A 






|det($K)| = 


det 


B Ik 




= |det(A)det(/if)| = |det(A)| 



Since A is a multiple of a square submatrix of the N x N DFT, we are done by Chebotarev's 
theorem. □ 

As an example of Theorem 16, pick iV to be a prime congruent to 1 mod 4, and select rows 
of the N X N DFT according to the index set Ai := {k^ : k G Z^r}. If we take K — 1, the process in 
Theorem 16 produces an equiangular tight frame of redundancy 2, which we will verify in the next 
chapter using quadratic Gauss sums; in the case where N = 5, this construction produces (2.10). 
Note that this corresponds to a special case of a construction in Zauner's thesis [150], which was 
later studied by Renes [115] and Strohmer [128]. Theorem 16 says that this construction is full 
spark. 

Maximally sparse frames have recently become a subject of active research [44, 70]. We note 
that when K = M, Theorem 16 produces a maximally sparse M x (A^ + K) full spark frame, having 
a total of M{M — 1) zero entries. To see that this sparsity level is maximal, we note that if the 
frame had any more zero entries, then at least one of the rows would have M zero entries, meaning 
the corresponding M x M submatrix would have a row of all zeros and hence a zero determinant. 
Similar ideas were studied previously by Nakamura and Masson [107]. 

Another interesting case is where K = M = N, i.e., when the frame constructed in Theorem 16 
is a union of the unitary DFT and identity bases. Unions of orthonormal bases have received 
considerable attention in the context of sparse approximation [61, 136]. In fact, when N is a, perfect 
square, concatenating the DFT with an identity basis forms the canonical example $ of a dictionary 
with small spark [61], and we used this example in the previous chapter. Recall the Dirac comb of 
V N spikes is an eigenvector of the DFT, and so concatenating this comb with the negative of its 
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Fourier transform produces a 2\/]V-sparse vector in the nullspace of In stark contrast, when N 
is prime, Theorem 16 shows that $ is full spark. 

The vast implications of Chebotarev's theorem leads one to wonder whether the result admits 
any interesting generalization. In this direction, Candes et al. [36] note that any such generalization 

must somehow account for the nontrivial subgroups of Z^r which arc not present when N is prime. 
Certainly, if one could characterize the full spark submatrices of a general DFT, this would provide 
ample freedom to optimize full spark frames for additional considerations. While we do not have a 
characterization for the general case, we do have one for the case where A'' is a prime power. Before 
stating the result, we require a definition: 

Definition 17. We say a subset C Zjv is uniformly distributed over the divisors of N if, for 
every divisor d of N, the d cosets of {d) partition M. into subsets, each of size [^^J or [^^]. 

At first glance, this definition may seem rather unnatural, but we will discover some important 
properties of uniformly distributed rows from the DFT. As an example, we briefly consider uniform 
distribution in the context of the restricted isometry property (RIP). Recall that a matrix of random 
rows from a DFT and normalized columns is RIP with high probability [118]. We will show that 
harmonic frames satisfy RIP only if the selected row indices are nearly uniformly distributed over 
sufiiciently small divisors of N. 

To this end, recall that for any divisor d of N, the Fourier transform of the c?-sparse normalized 
Dirac comb :^X(i^> is the ^-sparse normalized Dirac comb \J~^X(d)- Let F be the N x N unitary 
DFT, and let $ be the harmonic frame which arises from selecting rows of F indexed by M and then 
normalizing the columns. In order for $ to be {K, (5)-RIP, A4 must contain at least one member of 
(d) for every divisor d of N which is < K, since otherwise 

which violates the lower RIP bound at a; = ;^X{^)- In fact, the RIP bounds indicate that 

cannot be more than 5 away from ]]a;]]^ = 1. Similarly, taking x to be :^X{^) niodulated by a, i.e., 
x[n] := :^X{«^)[n]e^'^'''"/-'^ for every n e Zjv, gives that ]]$a;]]^ = n {a + {d))\ is also no more 

than 5 away from 1. This observation gives the following result: 

Theorem 18. Select rows indexed by M. C Z^r from the N x N discrete Fourier transform matrix 



jM\XMn{d) 



\M\ 



\Mn{d)\ 
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and then normalize the columns to produce the harmonic frame Then $ satisfies the {K,5)- 
restricted isometry property only if 



\Mn{a + {d))\-i^ 



for every divisor d of N with d< K and every a = 0, . . . , — 1 . 

Now that we have an intuition for uniform distribution in terms of modulated Dirac combs and 
RIP, we take this condition to the extreme by considering uniform distribution over all divisors. 
Doing so produces a complete characterization of full spark harmonic frames when A'' is a prime 
power: 

Theorem 19. Let N be a prime power, and select rows indexed by M Q Zjy from the N x N 
discrete Fourier transform matrix to build the submatrix Then $ is full spark if and only if Ai 
is uniformly distributed over the divisors of N. 

Note that, perhaps surprisingly, an index set A4 can be uniformly distributed over p but not 
over p^, and vice versa. For example, A4 = {0, 1,4} is uniformly distributed over 2 but not 4, while 
M = {0, 2} is uniformly distributed over 4 but not 2. 

Since the first M rows of a DFT form a full spark Vandermondc matrix, let's check that this 
index set is uniformly distributed over the divisors of N. For each divisor d of N, wo partition 
the first M indices into the d cosets of (d) . Write M = qd + r with < r < d. The first qd of 
the M indices are distributed equally amongst all d cosets, and then the remaining r indices are 
distributed equally amongst the first r cosets. Overall, the first r cosets contain q + 1 = [^J + 1 
indices, while the remaining d — r cosets have q = [^J indices; thus, the first M indices are 
indeed uniformly distributed over the divisors of N. Also, when N is prime, every subset of Zjv 
is uniformly distributed over the divisors of A'' in a trivial sense. In fact, Chebotarev's theorem 
follows immediately from Theorem 19. In some ways, portions of our proof of Theorem 19 mirror 
recurring ideas in the existing proofs of Chebotarev's theorem [59, 67, 126, 131]. For the sake of 
completeness, we provide the full argument and save the reader from having to parse portions of 
proofs from multiple references. We start with the following lemmas, whose proofs are based on the 
proofs of Lemmas 1.2 and 1.3 in [131]. 

Lemma 20. Let N be a power of some prime p, and let P{zi, . . . , zm) be a polynomial with integer 
coefficients. Suppose there exists Nth roots of unity {ujm}m=i such that P{oji, . . . , cjm) = 0. Then 
P{1, • • ■ , 1) is a multiple of p. 
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Proof. Denoting uj := e"^'^'/^, then for every m = 1, . . . , M, we have uJm = ^^"^ for some < fc^ < 
N. Defining the polynomial Q{z) := P{z''^, . . . , z'^'^), we have Q(w) = by assumption. Also, Q{z) 
is a polynomial with integer coefficients, and so it must be divisible by the minimal polynomial of 
LO, namely, the cyclotomic polynomial $Ar(z). Evaluating both polynomials at = 1 then gives that 
p= i>Ar(l) divides Q(l) = □ 

Lemma 21. Let N be a power of some prime p, and pick A4 = {mi}f£i C Zjv such that 

n ("^j' ~ '^») 

l<i<.<M _ ^2.11) 



M-l 

m! 

m=0 



n 



is not a multiple of p. Then the rows indexed by M. in the N x N discrete Fourier transform form 
a full spark frame. 

Proof. We wish to show that det{uj^)meM.i<n<M 7^ for all M-tuples of distinct A''th roots of 
unity {wn}*^]^. Define the polynomial D{zi, . . . ,Zm) '■= det(z™)mgAi,i<n<M- Since columns i and 
j of (2™)meM,i<ra<M are identical whenever Zi = Zj, we know that D vanishes in each of these 
instances, and so we can factor: 

D{zi,. . . ,Zm) = P{Z1,. . . ,Zm) Yl (^3~^i) 

l<i<j<M 

for some polynomial P{zi,. . . , Zm) with integer coefficients. By Lemma 20, it suffices to show that 
P(l, . . . , 1) is not a multiple of p, since this implies D{lo\, . . . ,ojm) is nonzero for all M-tuples of 
distinct Nth roots of unity {cOn}n=i- 

To this end, we proceed by considering 



^-■=^''dz-J V'dz-J ■■■('^'8^^ D{z„...,zm) 



(2.12) 

2l=---=ZM = l 



To compute A, we note that each application of Zj ^ produces terms according to the product rule. 

For some terms, a linear factor of the form Zj — Zi or Zi — Zj is replaced by Zj or —Zj , respectively. For 
each the other terms, these linear factors arc untouched, while another factor, such as P{zi, . . . , zm), 
is differentiated and multiplied by Zj. Note that there are a total of M(M — f)/2 linear factors, 
and only M{M — l)/2 differentiation operators to apply. Thus, after expanding every product 
rule, there will be two types of terms: terms in which every differentiation operator was applied 
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to a linear factor, and terms which have at least one linear factor remaining untouched. When we 
evaluate a.t Zi = ■■ ■ = Zm = 1, the terms with linear factors vanish, and so the only terms which 
remain came from applying every differentiation operator to a linear factor. Furthermore, each of 
these terms before the evaluation is of the form P{zi, . . . , zm) ni<i<j<M 

Zj, and so evaluation at 

Zi = ■ ■ ■ = zm = 1 produces a sum of terms of the form . . . , 1); to determine the value of A, it 
remains to count these terms. The M — 1 copies of zm can only be applied to linear factors of 
the form zm — Zi, of which there are M — 1, and so there are a total of (M — 1)! ways to distribute 
these operators. Similarly, there are (M — 2)! ways to distribute the M — 2 copies of zm-i q^^ ^ 
amongst the M — 2 linear factors of the form Zm-i — Zi. Continuing in this manner produces an 
expression for A: 

A = {M- 1)!(M - 2)! • • • 1!0! P(l, . . . , 1). (2.13) 

For an alternate expression of A, we substitute the definition of D{zi, . . . , zm) into (2.12). Here, 
we exploit the multilinearity of the determinant and the fact that {zn-^)z^ = mz™ to get 

A = det{m'^~'^)meM,i<n<M = J| (wj - rui), (2.14) 

l<i<j<M 

where the final equality uses the fact that {m"~^)meM,i<n<M is the transpose of a Vandermonde 
matrix. Equating (2.13) to (2.14) reveals that (2.11) is an expression for P(l,...,l). Thus, by 
assumption, P(l, . . . , 1) is not a multiple of p, and so we are done. □ 

Proof of Theorem 19. (<^=) We will use Lemma 21 to demonstrate that $ is full spark. To apply 
this lemma, we need to establish that (2.11) is not a multiple of p, and to do this, we will show that 
there are as many p-divisors in the numerator of (2.11) as there are in the denominator. We start 
by counting the p-divisors of the denominator: 

M-l M-1 m M-lM-l 

TO=0 m=l e=i e=i m=l 

For each pair of integers k,a > 1, there are max{Af — ap*^, 0} factors in (2.15) of the form £ — ap^ . 
By adding these, wc count each factor £ as many times as it can be expressed as a multiple of a 
power of p, which equals the number of p-divisors in (.. Thus, the number of p-divisors of (2.15) is 

Y.^M-ap^). (2.16) 

fe=l a=l 
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Next, we count the p-divisors of the numerator of (2.11). To do this, we use the fact that A4 is 
uniformly distributed over the divisors of N. Since A'' is a power of p, the only divisors of N are 
smaller powers of p. Also, the cosets of {p'^) partition A4 into subsets Sk,b '■= {mi = b mod p'^}. 
We note that nij — rni is a multiple of p^ precisely when rrii and rrij belong to the same subset Sk,b 
for some < 6 < p*^. To count p-divisors, we again count each factor rrij — nii as many times as it 
can be expressed as a multiple of a prime power: 

fe=l b=0 ^ ' 

Write M = qp^ + r with Q < r < p^. Then q = [^J. Since M. is uniformly distributed over p^, 
there are r subsets Skfi with g + 1 elements and p'^ — r subsets with q elements. We use this to get 




Rearranging and substituting M = qp^ + r then gives 

k [—J 

E C^"') = I (2M - + 1)/) = - + = f (M - ap% 

b=0 ^ ^ ^ ^ a=l 

Thus, there are as many p-divisors in the numerator (2.17) as there are in the denominator (2.16), 
and so (2.11) is not divisible by p. Lemma 21 therefore gives that $ is full spark. 

(=^>) We will prove that this direction holds regardless of whether N is & prime power. Suppose 
C Z]v is not uniformly distributed over the divisors of N. Then there exists a divisor d of N such 
that one of the cosets of (d) intersects A4 with < [^J — 1 or > [^] + 1 indices. Notice that if a 
coset of (d) intersects A4 with < [^J — 1 indices, then the complement Ai'' intersects the same coset 
with > [ ^'^^^ 1 + 1 = r ^ ] + 1 indices. By Theorem 14(iii), Ai produces a full spark harmonic 
frame precisely when A4'^ produces a full spark harmonic frame, and so we may assume without loss 
of generality that there exists a coset of {d) which intersects A4 with > [^] +1 indices. 

To prove that the rows with indices in A4 are not full spark, we find column entries which produce 
a singular submatrix. Writing M = qd + r with < r < d, let /C contain q = cosets of (^) 
along with r elements from an additional coset. We claim that the DFT submatrix with row entries 
A4 and column entries /C is singular. To see this, shufile the rows and columns to form a matrix A 
in which the row entries are grouped into common cosets of {d) and the column entries are grouped 
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into common cosets of (-j). This breaks A into rank-1 submatrices: each pair of cosets a + {d) and 
^ + ('^) produces a submatrix 

f, ,ia+id)(b+j ^)\ , ,abf, ,bdi, ,aM-j\ 

for some index sets X and J; this is a rank-1 outer product. Let jC be the largest intersection between 
A4 and a coset of (d). Then \£\ > \^~\ + 1 is the number of rows in the tallest of these rank-1 
submatrices. Define Ac to be the M x M matrix with entries Ac[i,j] = A[i,j] whenever i G C and 
zero otherwise. Then 

Rank(A) = Rank(A£ + A- Ac)< Ra,uk{Ac) + Rank(y4 - Ac). (2.18) 

Since A — Ac has |£| rows of zero entries, we also have 

Rank(A -Ac)<M-\£\<M-{\f]+ 1). (2.19) 

Moreover, since we can decompose Ac into a sum of \^~\ zero-padded rank-1 submatrices, wc have 
R&nk{Ac) < \^]. Combining this with (2.18) and (2.19) then gives that Rank(A) < M - 1, and 
so the DFT submatrix is not invertible. □ 

Note that our proof of Theorem 19 estabhshes the necessity of having row indices uniformly 
distributed over the divisors of TV in the general case. This leaves some hope for completely char- 
acterizing full spark harmonic frames. Naturally, one might suspect that the uniform distribution 
condition is sufficient in general, but this suspicion fails when A'^ = 10. Indeed, the following DFT 
submatrix is singular despite the row indices being uniformly distributed over the divisors of 10: 

Jme{0,l,3,4},ne{0,l,2,6}- 

Just as we used Chebotarev's theorem to analyze the harmonic equiangular tight frames from 
Xia et al. [146] , we can also use Theorem 19 to determine whether harmonic equiangular tight frames 
with a prime power number of frame elements are full spark. Unfortunately, none of the infinite 
families in [146] have the number of frame elements in the form of a prime power (other than primes). 
Luckily, there is at least one instance in which the number of frame elements happens to be a prime 

d_-i <^+l_1 

power: the harmonic frames that arise from Singer difference sets have M = and N = ^ _^ 
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for a prime power q and an integer > 2; when g' = 3 and d = 4, the number of frame elements 
N = 11^ is a prime power. In this case, the row indices we select are 

M ={l, 2, 3, 6, 7, 9, 11, 18, 20, 21, 25, 27, 33, 34, 38, 41, 44, 47, 53, 54, 55, 56, 

58, 59, 60, 63, 64, 68, 70, 71, 75, 81, 83, 89, 92, 99, 100, 102, 104, 114}, 

but these are not uniformly distributed over 11, and so the corresponding harmonic frame is not full 
spark by Theorem 19. 

2.2 The computational complexity of verifying full spark 

In the previous section, we constructed a large collection of deterministic full spark frames. To see 
how special these constructions are, we consider the following question: How much computation is 
required to check whether any given frame is full spark? At the heart of the matter is computational 
complexity theory, which provides a rigorous playing field for expressing how hard certain problems 
are. In this section, we consider the complexity of the following problem: 

Problem 22 (Full Spark). Given a matrix, is it full spark? 

For the lay mathematician. Full Spark is "obviously" NP-hard because the easiest way he can 
think to solve it for a given M x N matrix is by determining whether each of the M x M submatrices 
is invertible; computing (^) determinants would do, but this would take a lot of time, and so Full 
Spark must be NP-hard. However, computing (^) determinants may not necessarily be the fastest 
way to test whether a matrix is full spark. For example, perhaps there is an easy-to-calculate 
expression for the product of the determinants; after all, this product is nonzero precisely when the 
matrix is full spark. Recall that Theorem 19 gives a very straightforward litmus test for Full Spark 
in the special case where the matrix is formed by rows of a DFT of prime-power order — who's to 
say that a version of this test does not exist for the general case? If such a test exists, then it would 
suffice to find it, but how might one disprove the existence of any such test? Indeed, since we are 
concerned with the necessary amount of computation, as opposed to a sufficient amount, the lay 
mathematician's intuition is a bit misguided. 

To discern how much computation is necessary, the main feature of interest is a problem's com- 
plexity. We use complexity to compare problems and determine whether one is harder than the 
other. As an example of complexity, intuitively, doubling an integer is no harder than adding in- 
tegers, since one can use addition to multiply by 2; put another way, the complexity of doubling 



50 



is somehow "encoded" in the complexity of adding, and so it must be lesser (or equal). To make 
this more precise, complexity theorists use what is called a polynomial-time reduction, that is, a 
polynomial-time algorithm that solves problem A by exploiting an oracle which solves problem B; 
the reduction indicates that solving problem A is no harder than solving problem B (up to polyno- 
mial factors in time), and wc say "A reduces to _B," or A < B. Since wc can use the polynomial-time 
routine a; -|- x to produce 2x, we conclude that doubling an integer reduces to adding integers, as 
expected. 

In complexity theory, problems are categorized into complexity classes according to the amount 
of resources required to solve them. For example, the complexity class P contains all problems 
which can be solved in polynomial time, while problems in EXP may require as much as exponential 
time. Problems in NP have the defining quality that solutions can be verified in polynomial time 
given a certificate for the answer. As an example, the graph isomorphism problem is in NP because, 
given an isomorphism between graphs (a certificate), one can verify that the isomorphism is legit in 
polynomial time. Clearly, P C NP, since we can ignore the certificate and still solve the problem in 
polynomial time. Finally, a problem B is called NP-hard if every problem A in NP reduces to B, and 
a problem is called NP-complete if it is both NP-hard and in NP. In plain speak, NP-hard problems 
are harder than every problem in NP, while NP-complete problems are the hardest of problems in 
NP. 

At this point, it should be clear that NP-hard problems are not merely problems that seem to 
require a lot of computation to solve. Certainly, NP-hard problems have this quality, as an NP-hard 
problem can be solved in polynomial time only if P = NP; this is an open problem, but it is widely 
believed that P 7^ NP. However, there are other problems which seem hard but are not known 
to be NP-hard (e.g., the graph isomorphism problem). Rather, to determine whether a problem is 
NP-hard, one must find a polynomial-time reduction that compares the problem to all problems in 
NP. To this end, notice that A < B and B < C together imply A < C\ and so to demonstrate that 
a problem C is NP-hard, it suffices to show that B < C for some NP-hard problem B. 

Unfortunately, it can sometimes be difficult to find a deterministic reduction from one problem 
to another. One example is reducing the satisfiability problem (SAT) to the unique satisfiability 
problem (Unique SAT). To be clear, SAT is an NP-hard problem [89] that asks whether there 
exists an input for which a given Boolean function returns "true," while Unique SAT asks the 
same question with an additional promise: that the given Boolean function is satisfiable only if 
there is a unique input for which it returns "true." Intuitively, Unique SAT is easier than SAT 
because we might be able to exploit the additional structure of uniquely satisfiable Boolean functions; 
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thus, it could be difficult to find a reduction from SAT to Unique SAT. Despite this intuition, 
there is a randomized polynomial-time reduction from SAT to Unique SAT [138]. Defined over 
all Boolean functions of n variables, the reduction maps functions that are not satisfiable to other 
functions that are not satisfiable, and with probability > , it maps satisfiable functions to uniquely 
satisfiablo functions. After applying this reduction to a given Boolean function, if a Unique SAT 
oracle declares "uniquely satisfiable," then we know for certain that the original Boolean function 
was satisfiable. But the reduction will only map a satisfiable problem to a uniquely satisfiable 
problem with probability > , so what good is this reduction? The answer lies in something called 
amplification; since the success probability is, at worst, polynomially small in n (i.e., > ^^), we 
can repeat our oracle-based randomized algorithm a polynomial number of times np{n) and achieve 
an error probability < (1 — ^^)"^^"^ ^ which is exponentially small. 

In this section, we give a randomized polynomial-time reduction from a problem in matroid 
theory. Before stating the problem, we first briefiy review some definitions. To each bipartite graph 
with bipartition {E,E'), we associate a transversal matroid {E,X), where I is the collection of 
subsets of E whose vertices form the ends of a matching in the bipartite graph; subsets in I are 
called independent. Next, just as spark is the size of the smallest linearly dependent set, the girth of 
a matroid is the size of the smallest subset of E that is not in X. In fact, this analogy goes deeper: 
A matroid is representable over a field F if, for some M, there exists a mapping ip: E ^ F*^ such 
that If {A) is linearly independent if and only if A G I; as such, the girth of {E,I) is the spark of 
ip{E). In our reduction, we make use of the fact that every transversal matroid is representable over 
M [112]. We are now ready to state the problem from which we will reduce Full Spark: 

Problem 23. Given a bipartite graph, what is the girth of its transversal matroid? 

Before giving the reduction, we note that Problem 23 is NP-hard. This is demonstrated in 
McCormick's thesis [100], which credits the proof to Stockmeyer; since [100] is difficult to access, 
we refer the reader to [2]. We now turn to the main result of this section; note that our proof is 
specifically geared toward the case where the matrix in question has integer entries — this is stronger 
than manipulating real (complex) numbers exactly as well as with truncations and tolerances. 

Theorem 24. Full Spark is hard for NP under randomized polynomial-time reductions. 

Proof. We will give a randomized polynomial-time reduction from Problem 23 to Full Spark. As 
such, suppose we are given a bipartite graph G, in which every edge is between the disjoint sets A 
and B. Take M := |B| and N := \A\. Using this graph, we randomly draw an MxN matrix $ using 
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the following process: for each i G B and j G A, pick the entry $ij randomly from {1, . . . , N2'^~^^} 
if i -H- J in G; otherwise set = 0. In Proposition 3.11 of [99], it is shown that the columns of 
$ form a representation of the transversal matroid of G with probability > ^. For the moment, we 
assume that $ succeeds in representing the matroid. 

Since the girth of the original matroid equals the spark of its representation, for each K = 
1,...,M, we test whether Spark($) > K. To do this, take H to be some M x P full spark 
frame. We will determine an appropriate value for P later, but for simplicity, we can take H 
to be the Vandermonde matrix formed from bases {1,...,P}; see Lemma 12. We claim we can 
randomly select K indices /C C {1, . . . , P} and test whether is full spark to determine whether 
Spark($) > K. Moreover, after performing this test for each K = 1,...,M, the probability of 
incorrectly determining Spark($) is < 5, provided P is sufficiently large. 

We want to test whether is full spark and use the result as a proxy for whether Spark($) > 
K. For this to work, we need to have Rank(i?^<l>/c') = K precisely when Rank($K;') = K for 
every JC' C {1,...,N} of size K. To this end, it suffices to have the nullspace Af{H^) of 
intersect trivially with the column space of for every /C'. To be clear, it is always the case 
that Rank(i?^$K;') < Rank($K;')i a^^d so Rank($K;') < K implies Rank(i/^$K;') < K. If we 
further assume that J\f{H^) fl Span($x:') = {0}, then the converse also holds. To see this, suppose 
Rank(iJ^$x;') < K. Then by the rank- nullity theorem, there is a nontrivial x G J\f{H^^ic'). Since 
H^^jC'X = 0, we must have ^jc'X e J\f{H^), which in turn implies x € J\f{^jc') since N{H^) n 
Span($x:') = {0} by assumption. Thus, Rank($x;') < -'^ by the rank- nullity theorem. 

Now fix /C' C {1, . . . , N} of size K such that Rank($;c/) = K. We will show that the vast majority 
of choices /C C {1, . . . , P} of size K satisfy J\f{H^) n Span($;c') = {0}- To do this, we consider the 
columns {hk}keic of -ffyc one at a time, and we make use of the fact that Af{H^) — Hkeic-^i^k)- 
In particular, since H is full spark, there are at most M — K columns of H in the orthogonal 
complement of Span(>I>^'), and so there are at least P — (M — K) choices of hk^ for which Mih*^^) 
does not contain Span($x:')' i-^v 

A\m(j\f{hl^) n Span($K:')) =K-l. 

Similarly, after selecting the first J /i^'s, we have dim(S') = K — J, where 

J 

S:=f]Af{hl.)nSpa.n{^,C'). 
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Again, since H is full spark, there are at most M — {K — J) columns of H in the orthogonal 
complement of S, and so the remaining P — [M — {K — J)) columns are candidates for /ifej+i that 
give 

/ J+l X 

dim( fl J^ihl.) n Span($;c') ) = dim(M{hl^^) ns) =K-{J+1). 
Overall, if we randomly pick /C C {1, . . . , P} of size K, then 



Pv[/^m) n Span($K:') = {0}) > (1 - ^)(1 - ^^i^) ... (1 

>(l-f)^ 



M-l> 



> 1 _ MJf 
^ J- p ) 



where the final step is by Bernoulli's inequality. Taking a union bound over all choices of /C' C 
{!,..., A''} and all values of K = 1, . . . ,M then gives 

Pr^fail to determine Spark($)^ < ^ (^^^Pr(A/'(if;^) n Span($x:') {0}) 



a:/ p 



< 



Thus, to make the probability of failure < 5, it suffices to have P = M^2^+^. 

In summary, we succeed in representing the original matroid with probability > 5, and then 
we succeed in determining the spark of its representation with probability > | . The probability of 
overall success is therefore > j. Since our success probability is, at worst, polynomially small, we 
can apply amplification to achieve an exponentially small error probability. □ 

Our use of random linear projections in the above reduction to Full Spark is similar in spirit to 
Valiant and Vazirani's use of random hash functions in their reduction to Unique SAT [138] . Since 
their randomized reduction is the canonical example thereof, we find our reduction to be particularly 
natural. 

To conclude this section, we clarify that Theorem 24 is a statement about the amount of com- 
putation necessary in the worst case. Indeed, the hardness of Full Spark does not rule out the 

existence of smaller classes of matrices for which full spark is easily determined. As an example, 
Theorem 19 determines Full Spark in the special case where the matrix is formed by rows of a 
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DFT of prime-power order. This illustrates the utility of applying additional structure to efficiently 
solve the Full Spark problem, and indeed, such classes of matrices are rather special for this 
reason. 

2.3 Phaseless recovery with polarization 

In the previous sections, we constructed deterministic full spark frames and showed that checking 
for full spark in general is computationally hard. In this section, we provide a new technique 
for phaseless recovery which makes use of full spark frames in the measurement design. We are 
particularly interested in using the fewest measurements necessary for recovery, namely N = 0(M), 
where M is the dimension of the signal [15]. 

Take a finite set V, and suppose we take phaseless measurements of a; e with a frame 
$y := {^i}iev C with the task of recovering a; up to a global phase factor. For notational 
convenience, we take ~ to be the equivalence relation of being identical up to a global phase factor, 
and we say y is a member of the equivalence class [x] G C^^/^ if y x. Having |(a;, for every 
i gV, we claim it suffices to determine the relative phase between all pairs of frame coefficients. If 
we had this information, we could arbitrarily assign some nonzero frame coefficient = |(a;,<pj)| to 
have positive phase. If {x, ipj) is also nonzero, then it has well-defined relative phase 

which determines the frame coefficent by multiplication: cj = uiij \{x,ipj)\. Otherwise when {x, cpj ) = 
0, we naturally take Cj = 0, and for notational convenience, we arbitrarily take Uij = 1. From here, 
[a;] e C^/~ can be identified by applying the canonical dual frame {<fij}jev of 

jev jev jev 

To find the relative phase between frame coefficients, we turn to the polarization identity: 

J 3 2 1 ^ 2 

{x,ipi){x,ipj) = - ^i''\{x,ipi) +r''{x,ipj)\ = - ^i''\{x,ipi + i''ipj)\ . 

fe=0 k=0 

Thus, if in addition to we measure with {tpi + i*'<^j}|=0' above calculation to 

determine {x, fi){x, ipj) and then normalize to get the relative phase uiij, provided both (x, ipi) and 
{x, (fij) are nonzero. To summarize, if we measure with and {Vi + i'^¥'i}fe=o ^^r every pair i,j e V, 
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then we can recover [x]. However, such a method uses \V\ +4('^') measurements, and since is 
a frame, we necessarily have |F| > M and thus a total of f2(M^) measurements. 

In pursuit of 0(M) measurements, take some simple graph G = {V,E), and only take mea- 
surements with $v and <J>_e := U(i j)e£;{'i^« + i'°'/'j}fe=o- To recover [x], we again arbitrarily assign 
some nonzero vertex measurement to have positive phase, and then we propagate relative phase 
information along the edges by multiplication to determine the phase of the other vertex measure- 
ments relative to the original vertex measurement. However, if x is orthogonal to a given vertex 
vector, then that measurement is zero, and so relative phase information cannot propagate through 
the corresponding vertex; indeed, such orthogonality has the effect of removing the vertex from the 
graph, and for some graphs, this will prevent recovery. For example, if G is a star, then x could 
be orthogonal to the vector corresponding to the internal vertex, whose removal would render the 
remaining graph edgeless. That said, we should select $y and G so as to minimize the impact of 
orthogonality with vertex vectors. 

First, we can take $y to be full spark so that every subcollection of M frame elements spans. 
This implies that x is orthogonal to at most Af — 1 members of $y, thereby limiting the extent of 
x's damage to our graph. Additionally, being full spark frees us from requiring the graph to be 
connected after the removal of vertices; indeed, any remaining component of size M or more will 
correspond to a subframe of $y that necessarily has a dual frame to reconstruct with. It remains 
to find a graph of 0(M) vertices and edges that maintains a size-M component after the removal 
of any M — 1 vertices. 

To this end, we consider a well-studied family of sparse graphs known as expander graphs. We 
choose these graphs for their notably strong connectivity properties. There is a combinatorial 
definition of expander graphs, but we will focus on the spectral definition. Given a d-regular graph 
G of n vertices, consider the eigenvalues of its adjacency matrix: Ai > A2 > • • • > A„. We say G 
has expansion A(G) := ^max{|A2|, |A„|}. Furthermore, a family of d-regular graphs {Gi}^i is a 
spectral expander family if there exists c < 1 such that every Gj has expansion A(Gj) < c. Since 
d is constant over an expander family, we see that expanders with many vertices are particularly 
sparse. There are many results which describe the connectivity of expanders, but the following is 
particularly relevant to our application: 

Lemma 25 ([78]). Consider a d-regular graph G of n vertices with spectral expansion < A. For all 
e < removing any edn edges from G results in a connected component of size > (1 — ji^v)'^- 

For our application, removing en vertices from a rf-regular graph necessarily removes < edn 
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edges, and so this lemma directly applies. Also, 



1-A 1 2 2e 



6 6 3 - 1 -A' 

where the last inequality is a rearrangement of e < Since we want to guarantee that the 

removal of any M — 1 vertices maintains a size-M component, we must therefore take M < en + 1. 
Overall, we use the following criteria to pick our expander graph: Given the signal dimension M, 
use a d- regular graph G = {V, E) of n vertices with spectral expansion A such that M < + 1. 

Then by the previous discussion, the total number of measurements is = \V\ + ^\E\ = (2d + l)n. 
We wish to find choices of graphs which yield only N — 0(M) measurements. 

To minimize the redundancy ^ , we see that for a fixed degree d, we would like minimal spectral 
expansion A. Spectral graph families known as Ramanujan graphs are asymptotically optimal in this 
sense; taking to be the set of connected rf-regular graphs with > n vertices, Alon and Boppana 
(see [4]) showed that for any fixed d, 

lim inf A(G) > 

n^ooGeg^ d 

while Ramanujan graphs are defined to have spectral expansion < ■ To date, Ramanujan 

graphs have only been constructed for certain values of d. One important construction was given 
by Lubotzky et al. [98], which produces a Ramanujan family whenever d — 1 = 1 mod 4 is prime. 
Among these graphs, we get the smallest redundancy ^ when d = 6 and M = \_{^^^)n + IJ: 

N_ ^ {2d+l)n ^ 6rf(2rj+l) ^ 234 ^ 
M - (1 - A)n/6 - d- 2y/d^ 3 - \/5 ~ 

Thus, in such cases, we may perform phaseless recovery with only N < 307M measurements. 
However, the number of vertices in each Ramanujan graph from [98] is of the form q{q^ — 1) or 
^^"^2"^^ ' where q = 1 mod 4 is prime, and so any bound on redundancy using graphs from [98] 
will only be valid for particular values of M. 

In order to get N = 0(M) in general, we use the fact that random graphs are nearly Ramanujan 
with high probability. In particular, for every e > and even d, a random rf-regular graph has 
spectral expansion A < g(2\/d — 1 + e) with high probability as n ^ 00 [71]. Thus, picking e and d 
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to satisfy ^{2^/d^^ + e) < 1, we may again take M = [(^^)n + IJ to get 

A'' 6(2rf+l) 6d(2d+l) 
M - 1- A - d - {2Vd^T^ 

with high probabihty. Note that in this case, n can be any sufficiently large integer, and so the 
above bound is valid for all sufficiently large M, i.e., our procedure can perform phaseless recovery 
with N = 0{M) measurements in general. 

Note that this section has only considered the case in which the phaseless measurements were 
not corrupted by noise. For the noisy case, Candcs ct al. [37] used scmidcfinitc programming to 
stably reconstruct from N = O(MlogM) measurements. Our technique also appears to be stable, 
and we expect positive results in this vein using synchronization-type analysis [124]; we leave this 
for future work. 
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Chapter 3 

Deterministic matrices with the 
restricted isometry property 



In Chapter 1, we observed how to use the Gershgorin circle theorem to demonstrate that certain 
M X N matrices have the restricted isometry property (RIP) for sparsity levels K = 0(\/M). In 
this chapter, we consider better demonstration techniques which promise to break this "square- 
root bottleneck" [16]. To date, the only deterministic construction that manages to go beyond the 
bottleneck is given by Bourgain ct al. [29] ; in the following section, wc discuss what they call flat RIP, 
which is the technique they use to demonstrate RIP. We will see that their technique can be used to 
demonstrate RIP for sparsity levels much larger than y/M, meaning one could very well demonstrate 
random-like performance given the proper construction. Later, we introduce an alternate technique, 
which can also demonstrate RIP for large sparsity levels. 

After considering the efScacy of these techniques to demonstrate RIP, it remains to find a de- 
terministic construction that is amenable to analysis. To this end, we discuss various properties of 
certain equiangular tight frames (ETFs). Specifically, real ETFs can be characterized in terms of 
their Gram matrices using strongly regular graphs [141]. By applying our demonstration techniques 
to real ETFs, we derive equivalent combinatorial statements in graph theory. By focussing on the 
ETFs which correspond to Paley graphs of prime order, we are able to make important statements 
about their clique numbers and provide some intuition for an open problem in number theory. We 
conclude by conjecturing that the Paley ETFs are RIP in a manner similar to random matrices. 
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3.1 Flat restricted orthogonality 

In [29], Bourgain et al. provided a deterministic construction of M x RIP matrices that support 
sparsity levels K on the order of M^/^+^ for some small value of e. To date, this is the only known 
deterministic RIP construction that breaks the square-root bottleneck. In this section, we analyze 
their technique for demonstrating RIP, but first, we provide some historical context. We begin with 
a definition: 

Definition 26. The matrix $ has {K, 9) -restricted orthogonality (RO) if 

\{'^x,^y)\<e\\x\\\\y\\ 

for every pair of if-sparse vectors x, y with disjoint support. The smallest for which $ has {K, 9)- 
RO is the restricted orthogonality constant (ROC) 9k- 

In the past, restricted orthogonality was studied to produce reconstruction performance guaran- 
tees for both ^i-minimization and the Dantzig selector [38, 40]. Intuitively, restricted orthogonality 
is important to compressed sensing because any stable inversion process for (1) would require <I> to 
map vectors of disjoint support to particularly dissimilar measurements. For the present chapter, 
we are interested in upper bounds on RICs; in this spirit, the following result illustrates some sort 
of equivalence between RICs and ROCs: 

Lemma 27 (Lemma 1.2 in [38]). 9k < ^ik <0k + 5k- 

To be fair, the above upper bound on 52k does not immediately help in estimating 52k, as it 
requires one to estimate 5k- Certainly, we may iteratively apply this bound to get 

52K <eK + e^Ki2-\ + + • • • + + <5i < (l + [logs K\)9k + 5x. (3.1) 

Note that 5\ is particularly easy to calculate: 



^1 = max 

ne{l,...,JV} 



which is zero when the columns of $ have unit norm. In pursuit of a better upper bound on ^ik-, 
we use techniques from [29] to remove the log factor from (3.1): 

Lemma 28. biK < "^Ok + Si- 
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Proof. Given a matrix $ = [t^i • • • t^jv], we want to upper-bound the smallest 5 for which (1 
5)\\x\\'^ < ||$a;||2 < (1 + (5)||a;|p, or equivalently: 



5> 



(3.2) 



for every nonzero 2ii'-sparse vector x. We observe from (3.2) that we may take x to have unit norm 
without loss of generality. Letting /C denote a size-2iv' set that contains the support of x, and letting 
{xk}k&K. denote the corresponding entries of x, the triangle inequality gives 



\\^xf-l 



^X,^i,^Xj^j) -1 



^^{Xi<fii,Xj(pj) + '^\\Xiipi\f - 1 



ieic 



< 



(3.3) 



Since Y^i^x. ~ second term of (3.3) satisfies 



ieK. 



(3.4) 



and so it remains to bound the first term of (3.3). To this end, we note that for each i,j e /C with 
j ^ i, the term (xi^pi, Xj(pj) appears in 



X 

ic/c iei jeic\i 

\I\=K 



as many times as there are size-ii' subsets of K, which contain i but not j, i.e., times. Thus, 

we use the triangle inequality and the definition of restricted orthogonality to get 



/2K-2\ 
{ K-1 J 



I(ZK iGX jG/C\I 
\I\=K 



\K-l) ICK 



\I\=K 



iGl 3&K.\I 



\Xi 



IC/C ^iGX ^ ^iG/C\I 

\I\=K 



1/2 
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At this point, x having unit norm impUes (X^jgx k«l^)^^^(SjeK;\i l^jP)^^^ — 5' 



jjti \I\=K 



(2K-2\ 2 (2K-2\ 

\K-\) I<ZK \K-l) 



2 V K 2 



Applying both this and (3.4) to (3.3) gives the result. □ 

Having discussed the relationship between restricted isometry and restricted orthogonality, we 
are now ready to introduce the property used in [29] to demonstrate RIP: 

Definition 29. The matrix $ = [<^i • • • ip^] has {K, 6)-flat restricted orthogonality if 

iex jej ' 

for every disjoint pair of subsets I,JC. {!,..., A''} with \ J\ < K. 

Note that $ has {K, ^K)-flat restricted orthogonality (FRO) by taking x and y in Definition 26 to 
be the characteristic functions xi and xji respectively. Also to be clear, flat restricted orthogonality 
is called flat RIP in [29]; we feel the name change is appropriate considering the preceeding literature. 
Moreover, the definition of fiat RIP in [29] required $ to have unit-norm columns, whereas we 
strengthen the corresponding results so as to make no such requirement. Interestingly, FRO bears 
some resemblence to the cut-norm of the Gram matrix defined as the maximum value of 

I X^iei X]jej"('/'ii \ over all subsets T,Sf Q {1, . . . , A^}; the cut-norm has received some attention 
recently for the hardness of its approximation [6] . The following theorem illustrates the utility of 
flat restricted orthogonality as an estimate of the RIG: 

Theorem 30. A matrix with {K, 6) -flat restricted orthogonality has a restricted orthogonality con- 
stant 6k which is < C6 log K, and we may take C = 75. 

Indeed, when combined with Lemma 28, this result gives an upper bound on the RIG: S2K < 
2C^logii' -|- Si. The noteworthy benefit of this upper bound is that the problem of estimating 
singular values of submatrices is reduced to a combinatorial problem of bounding the coherence of 
disjoint sums of columns. Furthermore, this reduction comes at the price of a mere log factor in the 
estimate. In [29], Bourgain et al. managed to satisiy this combinatorial coherence property using 
techniques from additive combinatorics. While we will not discuss their construction, we find the 
proof of Theorem 30 to be instructive; our proof is valid for all values of K (as opposed to sufficiently 
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large K in the original [29]), and it has near-optimal constants where appropriate. The proof can 
be found in the Appendix. 

To reiterate, Bourgain et al. [29] used flat restricted orthogonality to build the only known 
deterministic construction oi M x N RIP matrices that support sparsity levels K on the order 
of for some small value of s. Wc arc particularly interested in the efficacy of FRO as a 

technique to demonstrate RIP in general. Certainly, [29] shows that FRO can produce at least an e 
improvement over the Gershgorin technique discussed in the previous section, but it remains to be 
seen whether FRO can do better. 

In the remainder of this section, we will show that flat restricted orthogonality is actually capable 
of demonstrating RIP with much higher sparsity levels than indicated by [29]. Hopefully, this 
realization will spur further research in deterministic constructions which satisfy FRO. To evaluate 
FRO, we investigate how well it performs with random matrices; in doing so, we give an alternative 
proof that certain random matrices satisfy RIP with high probability: 

Theorem 31. Construct an M x N matrix $ by drawing each of its entries independently from a 
Gaussian distribution with mean zero and variance , take C to be the constant from Theorem 30, 
and set a = 0.01. Then $ has {K, 2c\ogK )'fl^^ restricted orthogonality and 5i < a5, and therefore 



the {2 K,S) -restricted isometry property, with high probability provided M > K log^ K log N . 



In proving this result, we will make use of the following Bernstein inequality: 

Theorem 32 (see [23, 148]). Let {Zm}m=i independent random variables of mean zero with 
bounded moments, and suppose there exists L > such that 

E|Z„|'= < (3.5) 



for every k > 2. Then 



Pr 



p M , M ^ 1/2' 

m=l ^ m=l 



< e"' (3.6) 



M s 1/2 



provided t< 



Proof of Theorem 31. Considering Lemma 28, it suffices to show that $ has restricted orthogonal- 
ity and that is sufficiently small. First, to demonstrate restricted orthogonality, it suffices to 
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demonstrate FRO by Theorem 30, and so we will ensure that the following quantity is small: 



XI Vi> X] -y^j ) = XI ( S "^'N ) ( X "^J'H ) 



(3.7) 



Notice that Xj„ := and := Y^j^j '■Pj[m] are mutually independent over all m = 

Ixi 

1, . . . , M since T and are disjoint. Also, Xm is Gaussian with mean zero and variance jj-, while 
Ym similarly has mean zero and variance Viewed this way, (3.7) being small corresponds to the 
sum of independent random variables ■= XmYm having its probability measure concentrated at 
zero. To this end. Theorem 32 is naturally applicable, as the absolute central moments of a Gaussian 
random variable X with mean zero and variance cr^ are well known: 



E|X|'= = < 



ftT'=^(fc-l)!! iffcodd, 
cr'^(fc - 1)!! if k even. 



Since — X^^Y^ is a product of independent Gaussian random variables, this gives 



Further since E|Z„|2 = M^, we may define L := 2^^^^^!^^ to get (3.5). Later, we will take 
^<(5<\/2-l<5. Considering 



t :-- 



< 



4 2LV M2 ; 2L 



M 



1/2 



we therefore have (3.6), which in this case has the form 



Pr 



X'^^'X'^j 

iei jej 



>m\j\y/' 



where the probability is doubled due to the symmetric distribution of X]m=i Since we need to 
account for all possible choices of I and J^, we will perform a union bound. The total number of 
choices is given by 



K K 

X X 

\X\ = l\J\ = l 



N\/N-\I\\ ,,nfN\ ^^opr 
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and so the union bound gives 



Pr 



$ does not have {K, §)-FRO < 26"^^"'/* N^^ = 2 exp ( - + 2K log 7v) . (3.8) 



Thus, Gaussian matrices tend to have FRO, and hence restricted orthogonaUty by Theorem 30; this 
is made more precise below. 

Again by Lemma 28, it remains to show that Si is sufficiently small. To this end, we note 
that M||(p„|p has chi-squared distribution with M degrees of freedom, and so we can use another 
(simpler) concentration-of- measure result; see Lemma 1 of [95]: 



Pr 



ll^nll^ 



1 



> 2 



\\ M M 



< 2e" 



for any t > 0. Specifically, we pick 



S' :=2( 



t t 



At 



M MJ - M' 



and we perform a union bound over the N choices for 



Pr 



< 2exp(^ — +logiVj. 



(3.9) 



To summarize. Lemma 28, the union bound. Theorem 30, and (3.8) and (3.9) give 



Pr 



52K > S 



< Pr 

< Pr 

< Pr 



9k > - — or > 



{l-a)5 



Pr 



5i > a6 



$ docs not have ( [\, 



Pr 



§1 > aS 



2C\ogKy 

( M/(l-a)(5\2 ^\ ^ / Ma8 , ,a 

^ ' ( - T ( ) + log at) + 2 exp ( - — + log at) , 



and so M > '^^K\o^ K\o%N gives that $ has (2ii',5)-RIP with high probability. 



□ 



We note that a version of Theorem 31 also holds for matrices whose entries are independent 
Bernoulli random variables taking values with equal probability. In this case, one can again 

apply Theorem 32 by comparing moments with those of the Gaussian distribution; also, a union 
bound with will not be necessary since the columns have unit norm, meaning 5i = 0. 
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3.2 Restricted isometry by the power method 

In the previous section, we established the efficacy of flat restricted orthogonality as a technique 
to demonstrate RIP. While flat restricted orthogonality has proven useful in the past [29], future 
deterministic RIP constructions might not use this technique. Indeed, it would be helpful to have 
other techniques available that demonstrate RIP beyond the square-root bottleneck. In pursuit of 
such techniques, we recall that the smallest 6 for which $ is {K, 6)-BIP is given in terms of operator 
norms in (1.1). In addition, we notice that for any self-adjoint matrix A, 

||A||2 = ||A(A)|U<||A(^)||„ 

where X{A) denotes the spectrum of A with multiplicities. Let A = UDU* be the eigenvalue 
decomposition of A. When p is even, we can express |lA(A)||p in terms of an easy-to-calculate trace: 

\\XiA)\\P = Tr[DP] = Tt[{UDU*)p] = Tr[AP]. 

Combining these ideas with the fact that || • — >■ || • ||oo pointwise leads to the following: 
Theorem 33. Given an M x N matrix define 

6k;, := ^^max TvW,c^^ - Ik?'^] ^ . 

A^t-| 1,. . . ,iV j- 
\K\=K 

Then $ has the {K, 6 K;q) -restricted isometry property for every q > 1. Moreover, the restricted 
isometry constant of ^ is approached by these estimates: liuiq^ao SK;q = Sk- 

Similar to flat restricted orthogonality, this power method has a combinatorial aspect that 
prompts one to check every sub-Gram matrix of size K; one could argue that the power method is 
slightly less combinatorial, as flat restricted orthogonality is a statement about all pairs of disjoint 
subsets of size < K. Regardless, the work of Bourgain et al. [29] illustrates that combinatorial 
properties can be useful, and there may exist constructions to which the power method would be 
naturally applied. Moreover, we note that since 6K;q approaches 5k, a sufficiently large choice of q 
should deliver better-than-e improvement over the Gershgorin analysis. How large should q be? If 
we assume $ has unit-norm columns, taking q = 1 gives 

~\IC\=K ~\IC\=K ietCjeK 
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where fx is the worst-case coherence of Equality is achieved above whenever $ is an ETF, in 
which case (3.10) along with reasoning similar to (1.5) demonstrates that $ is RIP with sparsity 
levels on the order of VM, as the Gershgorin analysis established. It remains to be shown how 5k;2 
compares. To make this comparison, we apply the power method to random matrices: 

Theorem 34. Construct an M x N matrix $ by drawing each of its entries independently from a 
Gaussian distribution with mean zero and variance jj , and take 5K;q to be as defined in Theorem 33. 
Then dK;q < and therefore $ has the {K, 6) -restricted isometry property, with high probability 
provided M > fiJfi+V? log ^. 

While flat restricted orthogonality comes with a negligible penalty of log^ K in the number of 
measurements, the power method has a penalty of K^^'^. As such, the case q = 1 uses the order 
of K"^ measurements, which matches our calculation in (3.10). Moreover, the power method with 
q = 2 can demonstrate RIP with K^^'^ measurements, i.e., K M^/^+^/^, which is considerably 
better than an e improvement over the Gershgorin technique. 

Proof of Theorem 34. Take t := - (^)^/^ and pick IC C {1, . . . ,N}. Then Theorem 11.13 

of [58] states 



Pr 



+ t] <a 

min 



+ t 



> 1 - 2e-^*'/2_ 



Continuing, we use the fact that A($]^$;(^) = ^(^x;)^ to get 



1 _ 2e-^*V2 



< Pr 



< Pr 



1-3 



M 



+ t 



(3.11) 



where the last inequality follows from the fact that i^Y^"^ + t < 1. Since ^j^^A: ^^'^ are 
simultaneously diagonalizable, the spectrum of ^^^j^—Ik is given by X{^^^j(^—Ik) = A($]^$;^;) — 1. 
Combining this with (3.11) then gives 



Pr 



H^h'^K - Ik] 



< 3 



> 1 - 2e-^*'/^ 



67 



Considering TilA^i]^ = ||A(A)||2g < ii'^^ ||A(A)||oo, we continue: 



Pr 



Tr[{^*A - Ik f]^^ <S 







> Pr 









A($^$;c - Ik) 



< S 



> 1 - 2e-^*'/2. 



Prom here, we perform a union bound over all possible choices of /C: 



Pr 



3/C s.t. Tr[($]^$;c - Ik^]^^ > S < (^^)pr Tv[{^h^^ - 7x)'']^ > S 



< 2exp I ^ +ii:iog — I- 



2 ■ K J 

Rearranrinff M > ^ii^i+i/? loe «^ • 7^-1/2 < SM^^^ < sm^/^ j 



1/5M1/2 ,,„V l/2JMi/2\2 eiV 
^ = 2(3^-^'; ^2(9]^) >2i^log^. 



Combining (3.12) and (3.13) gives the result. 



(3.12) 



(3.13) 



□ 



3.3 Equiangular tight frames as RIP candidates 

In Chapter 1, we observed that equiangular tight frames (ETFs) are optimal RIP matrices under 
the Gershgorin analysis. In the present section, we reexamine ETFs as prospective RIP matrices. 
Specifically, we consider the possibility that certain classes of M x N ETFs support sparsity levels 
K larger than the order of \/M. Before analyzing RIP, let's first observe some important features of 
ETFs. Recall that Section 0.2 characterized ETFs in terms of their rows and columns. Interestingly, 
real ETFs have a natural alternative characterization. 

Let $ be a real M x N ETF, and consider the corresponding Gram matrix Observing 
Section 0.2, we have from (ii) that the diagonal entries of are I's. Also, (iii) indicates that the 
off-diagonal entries are equal in absolute value (to the Welch bound); since $ has real entries, the 
phase of each off-diagonal entry of is either positive or negative. Letting /i denote the absolute 
value of the off-diagonal entries, we can decompose the Gram matrix as = /jv + f^S, where 
5 is a matrix of zeros on the diagonal and ±l's on the off-diagonal. Here, S is referred to as a 
Seidel adjacency matrix, as S encodes the adjacency rule of a simple graph with i ^ j whenever 
S[i,j] = —1; this correspondence originated in [139]. 

There is an important equivalence class amongst ETFs: given an ETF $, one can negate any of 
the columns to form another ETF Indeed, the ETF properties in Section 0.2 are easily verified 
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to hold for this new matrix. For obvious reasons, $ and are called flipping equivalent. This 
equivalence plays a key role in the following result, which characterizes real ETFs in terms of a 
particular class of strongly regular graphs: 

Definition 35. We say a simple graph G is strongly regular of the form srg(i;, k, A, fi) if 

(i) G has V vertices, 

(ii) every vertex has k neighbors (i.e., G is k-regular), 

(ill) every two adjacent vertices have A common neighbors, and 

(iv) every two non-adjacent vertices have fj, common neighbors. 

Theorem 36 (Corollary 5.6 in [141]). Every real M x N equiangular tight frame with N > M + 1 
is flipping equivalent to a frame whose Seidel adjacency matrix corresponds to the join of a vertex 
with a strongly regular graph of the form 



Conversely, every such graph corresponds to flipping equivalence classes of equiangular tight frames 
in the same manner. 

The first chapter illustrated the main issue with the Gershgorin analysis: it ignores important 
cancellations in the sub-Gram matrices. We suspect that such cancellations would be more easily 
observed in a real ETF, since Theorem 36 neatly represents the Gram matrix's off-diagonal oscilla- 
tions in terms of adjacencies in a strongly regular graph. The following result gives a taste of how 
useful this graph representation can be: 

Theorem 37. Take a real equiangular tight frame $ with worst-case coherence ^, and let G denote 
the corresponding strongly regular graph in Theorem 36. Then the restricted isometry constant of <I> 
is given by Sk = {K — 1)/Lt for every K < ui{G) + 1, where ui{G) denotes the size of the largest clique 



Proof. The Gershgorin analysis (1.4) gives the bound Sk < {K — l)/i, and so it suffices to prove 
5k ^ {K — l)/i. Since K < lu{G) + 1, there exists a clique of size K in the join of G with a vertex. 
Let /C denote the vertices of this clique, and take Sic to be the corresponding Seidel adjacency 
submatrix. In this case, Sjc = Ik — Jk, where Jk is the K x K matrix of all I's. Observing the 




in G. 
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decomposition ^^^jc = Ik + l-iSjc, it follows from (1.1) that 

Sk > W^h^K - Ik\\2 = Wf^S^h = f^\\lK - Jxh = {K- 1)m, 

which concludes the proof. □ 

This result indicates that the Gershgoin analysis is tight for all real ETFs, at least for sufficiently 
small values of K. In particular, in order for a real ETF to be RIP beyond the square-root bottleneck, 
its graph must have a small clique number. As an example, note that the first four columns of the 
Steiner ETF in (1.6) have negative inner products with each other, and thus the corresponding 
subgraph is a clique. In general, each block oi an M x N Steiner ETF, whose size is guaranteed 
to be 0{VM), is a lower-dimensional simplex and therefore has this property; this is an alternative 
proof that the Gershgorin analysis of Steiner ETFs is tight for K = 0(\/M). 



3.3.1 Equiangular tight frames with flat restricted orthogonality 

To find ETFs that are RIP beyond the square-root bottleneck, we must apply better techniques than 
Gershgorin. We first consider what it means for an ETF to have {K, ^)-flat restricted orthogonality. 
Take a real ETF $ = [<^i • • • <^jv] with worst-case coherence fj,, and note that the corresponding 
Seidel adjacency matrix S can be expressed in terms of the usual {0, l}-adjacency matrix A of the 
same graph: S[i,j] = 1 — 2j4[z,j] whenever i ^ j. Therefore, for every disjoint I, JT" C {1, . . . ,N} 
with |J|, I J"! < K, we want 



m\j\y/'> 



iei jej 



= M 



ieijej 



2n 



E{X,J)--\1\\J\ 



(3.14) 



where E{I, J) denotes the number of edges between I and J in the graph. This condition bears a 
striking resemblence to the following well-known result in graph theory: 

Lemma 38 (Expander mixing lemma [85]). Given a d-regular graph ofn vertices, the second largest 
eigenvalue A of its adjacency matrix satisfies 



E{I,J)-^\I\\J\ 



< A(|X||J|)V2 



for every pair of vertex subsets I, J . 
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In words, the expander mixing lemma says that the number of edges between vertex subsets of 
a regular graph is roughly what you would expect in a random regular graph. For this lemma to 
be applicable to (3.14), we need the strongly regular graph of Theorem 36 to satisfy " n ~ 5- 
Using the formula for L, it is not difficult to show that l^v^T " 5 1 = 0(M-i/2) provided N = 0(M) 
and N > 2M. Furthermore, the second largest eigenvalue of the strongly regular graph will be 
A « |7V^/^, and so the expander mixing lemma says the optimal ^ is < 2iJ,X « { ^^ Y^^ since 
jjL = { m{n^i) Y^"^- This is a rather weak estimate for 6 because the expander mixing lemma does not 
account for the sizes of I and J being < K. Put in this light, a real ETF that has flat restricted 
orthogonality corresponds to a strongly regular graph that satisfies a particularly strong version of 
the expander mixing lemma. 

3.3.2 Equianguleir tight frames and the power method 

Next, we try applying the power method to ETFs. Given a real ETF $ ~ [cpi • • • (pw], let H := 
$*<j) — jj^j denote the "hollow" Gram matrix. Also, take Ejc to be the N x K matrix built from the 
columns of In that are indexed by /C. Then 

Tr[($^$;c - iKf"] = Tr[{E*^<P*<PE,c - Ik)''] = mE*^HE,cr''] = Tr[{HE,cm'% 
Since E^^E^ = YlikeK. ^k^h where 5/- is the fcth identity basis element, we continue: 



keic 



2q- 



= E E ^loH5k,---^l..-.HK^ (3-15) 

koEfC k^q—iEfC 

where the last step used the cyclic property of the trace. From here, note that H has a zero diagonal, 
meaning several of the terms in (3.15) are zero, namely, those for which k^^i = for some i e Z2q- 
To simplify (3.15), take /C^^^' to be the set of 2g-tuples satisfying fc^+i ^ ki for every t e "Liq- 

T,w^^^-iKf'^]= j2 n <'^'^-^fc^+i)='"'' E n (3-16) 

where /x is the wost-case coherence of and S is the corresponding Seidel adjacency matrix. Note 
that the left-hand side is necessarily nonnegative, while it is not immediate why the right-hand side 
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should be. This indicates that more simpUfication can be done, but for the sake of clarity, we will 
perform this simplification in the special case where q = 2; the general case is very similar. When 
q = 2,we are concerned with 4-tuples {fco, ki,k2, ks} e /C^*'. Let's partition these 4-tuples according 



to the value taken by /so and k,. 



Note, for a fixed ko and k2, that ki can be any value other 



than ko or ^2, as can k^. This leads to the following simplification: 



J2 n ^[^^'^^+l]= E E ( E S{ko,ki]S[kuk2])[ J2 S[k2,k3]S[k3,ko] 



EE E s[ko,k]s[k,k2] 



feoe/CfcaS/c fce/c 



k^ko 



= E J2 ^^''o,k]s[k,ko] +E E E s[ko,k]s[k,k2] 



/s27^fco ko^k^k-2 



The first term above is K{K — 1)^, while the other term is not as easy to analyze, as we expect a 
certain degree of cancellation. Substituting this simplification into (3.16) gives 

TV[($^$;, - iKf] = (k{K - 1)' + E E E ^[^0' ^]^[^' ') • 

^ ko&K fe2e/C keK ^ 

fe27^fei koi^k^ki 

If there were no cancellations in the second term, then it would equal K{K — 1){K — 2)^, thereby 
dominating the expression. However, if oscillations occurcd as a ±1 Bernoulli random variable, we 
could expect this term to be on the order of K'^ , matching the order of the first term. In this 
hypothetical case, since /x < M~^/^, the parameter 5^.2 defined in Theorem 33 scales as and 
so M ^ K^/'^\ this corresponds to the behavior exhibited in Theorem 34. To summarize, much like 
flat restricted orthogonality, applying the power method to ETFs leads to interesting combinatorial 
questions regarding subgraphs, even when q = 2. 



3.3.3 The Paley equiangular tight frame £is an RIP candidate 

Pick some prime p = 1 mod 4, and build an M x p matrix H by selecting the M := rows of the 
p X p discrete Fourier transform matrix which are indexed by Q, the quadratic residues modulo p 
(including zero). To be clear, the entries of H are scaled to have unit modulus. Next, take D to be 
an M X M diagonal matrix whose zeroth diagonal entry is and whose remaining M —1 entries 
are Now build the matrix $ by concatenating DH with the zeroth identity basis element; for 
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example, when p = 5, we have a 3 x 6 matrix: 



1 /I /i /I /i 1 

5 Y5 Y5 Y5 

i \/ie-2-i/5 y|e-2-V5 y|e-2-3/5 ^fe-^-iVs q 

y|g-2,ri4/5 y|g-2^i3/5 y|g-2^i2/5 y^g-^'^'/^ 



We claim that in general, this process produces an M x 2M equiangular tight frame, which we call 
the Pa/ey ETF [115]. Presuming for the moment that this claim is true, we have the following result 
which lends hope for the Paley ETF as an RIP matrix: 

Lemma 39. An M x 2M Paley equiangular tight frame has restricted isometry constant Sk < ^ for 
all K <M. 

Proof. First, we note that Theorem 16 used Chebotarev's theorem [126] to prove that the spark of 
the M X 2M Paley ETF $ is M + 1, that is, every size-M subcollection of columns of $ forms a 
spanning set. Thus, for every JC C {1, . . . , 2M} of size < M, the smallest singular value of is 
positive. It remains to show that the square of the largest singular value is strictly less than 2. Let 
a; be a unit vector for which ||$^a;|| = ||$^||2- Then since the spark of $ is M + 1, the columns of 
^IC" span, and so 

ii^^cii^ = \m\i = < m^r + w^hxr = < ii^iii = 11**112 = 2, 

where the final step follows from (i) and (ii) of Section 0.2, which imply = 27m. □ 

Now that we have an interest in the Paley ETF we wish to verify that it is, in fact, an ETF. It 
suffices to show that the columns of $ have unit norm, and that the inner products between distinct 
columns equal the Welch bound in absolute value. Certainly, the zeroth identity basis element is unit- 
norm, while the squared norm of each of the other columns is given by ^ + (M — 1)| = = 1. 
Also, the inner product between the zeroth identity basis element and any other column equals 
the zeroth entry of that column: = {^^j^^i^Y^^- It remains to calculate the inner product 
between distinct columns which are not identity basis elements. To this end, note that since = 6^ 
if and only if a = ±6, the sequence C Zj, doubly covers Q \ {0}, and so 

^ meQ\{0} /VVP / P k=0 

This well-known expression is called a quadratic Gauss sum, and since p = 1 mod 4, its value is 
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determined by the Legendre symbol in the following way: (y„, <^„') = ;^( " ^ " ) for every n, n' G Zp 
with n', where 



+1 if fc is a nonzero quadratic residue modulo p, 

if fc = 0, 
— 1 otherwise. 



Having established that $ is an ETF, we notice that the inner products between distinct columns 
of $ are real. This implies that the columns of $ can be unitarily rotated to form a real ETF ^f; 
indeed, one may take ^ to be the M x 2M matrix formed by taking the nonzero rows of in 
the Cholesky factorization $*$ = LL^ . As such, we consider the Paley ETF to be real. Prom 
here. Theorem 36 prompts us to find the corresponding strongly rcigular graph. First, wc can flip 
the identity basis element so that its inner products with the other cohimns of $ are all negative. 
As such, the corresponding vertex in the graph will be adjacent to each of the other vertices; 
naturally, this will be the vertex to which the strongly regular graph is joined. For the remaining 
vertices, n -f^ n' precisely when ( " ~" ) = —1, that is, when n' — n is not a quadratic residue. The 
corresponding subgraph is therefore the complement of the Paley graph, namely, the Paley graph 
[119]. In general, Paley graphs of order p necessarily have p=l mod 4, and so this correspondence 
is particularly natural. 

One interesting thing about the Paley ETF's restricted isometry is that it lends insight into 
important properties of the Paley graph. The following is the best known upper bound for the clique 
number of the Paley graph of prime order (see Theorem 13.14 of [28] and discussion thereafter), and 
we give a new proof of this bound using restricted isometry: 

Theorem 40. Let G denote the Paley graph of prime order p. Then the size of the largest clique is 
uj{G) < VP- 

Proof. We start by showing oj{G) + 1 < M. Suppose otherwise: that there exists a clique K of size 
M + 1 in the join of a vertex with G. Then the corresponding sub-Gram matrix of the Paley ETF 
has the form ^jc^x; — {^ + Ij)Im+i— IJ'Jm+i, where ji = p~^l'^ is the worst-case coherence and Jm+i 
is the (M -|- 1) X (M -|- 1) matrix of I's. Since the largest eigenvalue of Jm+i is M + 1, the smallest 
eigenvalue of ^'Jc^a: 1 - [M + l)p~^/^ = 1 - i(p -|- l)p-i/2, which is negative when p > 5, 

contradicting the fact that positive semidefinite. 
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Since oj{G) + 1 < M, we can apply Lemma 39 and Theorem 37 to get 



ojiG) 



(3.17) 



VP ' 



and rearranging gives the result. 



□ 



It is common to apply probabilistic and heuristic reasoning to gain intuition in number theory. 
For example, consecutive entries of the Legendre symbol are known to mimic certain properties of a 
±1 Bernoulli random variable [110]. Moreover, Paley graphs enjoy a certain quasi-random property 
that was studied in [50]. On the other hand, Graham and Ringrose [76] showed that, while random 
graphs of size p have an expected clique number of (l-|-o(l))2 logp/ log 2, Paley graphs of prime order 
deviate from this random behavior, having a clique number > c logp log log logp infinitely often. The 
best known universal lower bound, (1/2 + 0(1)) logp/ log 2, is given in [51], which indicates that the 
random graph analysis is at least tight in some sense. Regardless, this has a significant difference 
from the upper bound y/p in Theorem 40, and it would be nice if probabilistic arguments could be 
leveraged to improve this bound, or at least provide some intuition. 

Note that our proof (3.17) hinged on the fact that ^a>(G)+i < 1) courtesy of Lemma 39. Hence, 
any improvement to our estimate for 5a>(G)+i would directly lead to the best known upper bound 
on the Paley graph's clique number. To approach such an improvement, note that for large p, the 
Fourier portion of the Paley ETF DH is not significatly different from the normalized partial Fourier 
matrix {^y/'^H; indeed, \\H^D'^H,(^- ^H^H,^\\2 < | for every /C C Zj, of size < ^±1, and so the 
difference vanishes. If we view the quadratic residues modulo p (the row indices of H) as random, 
then a random partial Fourier matrix serves as a proxy for the Fourier portion of the Paley ETF. 
This in mind, we appeal to the following: 

Theorem 41 (Theorem 3.2 in [114]). Draw rows from the N x N discrete Fourier transform matrix 
uniformly at random with replacement to construct an M x N matrix, and then normalize the 
columns to form Then $ has restricted isometry constant Sk < S with probability 1 — e provided 
u^M — jj-^'^log^ -^log J^ logc"^, where C is a universal constant. 

In our case, both M and N scale as p, and so picking d to achieve equality above gives 




K log^ K log^ p log e 



-1 
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Continuing as in (3.17), denote w = co{G) and take K = co to get 

C' 2 , 2 , -1 ^ r2 (W - 1)' W2 

— wlog wlog ploge >d^ = > — , 

p p 2p 

and then rearranging gives u)/ log^ cj < C" log^ p log with probability 1 — e. Interestingly, having 
Lo/log^LO = O(log^p) with high probability (again, under the model that quadratic residues are 
random) agrees with the results of Graham and Ringrose [76]. This gives some intuition for what 
we can expect the size of the Paley graph's clique number to be, while at the same time demon- 
strating the power of Paley ETFs as RIP candidates. Wc conclude with the following, which can be 
reformulated in terms of both flat restricted orthogonality and the power method: 

Conjecture 42. The Paley equiangular tight frame has the {K, 5) -restricted isometry property with 
some 5 < \/2 — 1 whenever K < i^Sp , for some universal constants C and a. 

3.4 Appendix 

In this section, wc prove Theorem 30, which states that a matrix with {K, 0)-flat restricted orthog- 
onality has 9k < C9 log K, that is, it has restricted orthogonality. The proof below is adapted from 
the proof of Lemma 3 in [29] . Our proof has the benefit of being valid for all values of K (as opposed 
to sufficiently large K in the original [29]), and it has near-optimal constants where appropriate. 
Moreover in this version, the columns of the matrix are not required to have unit norm. 

Proof of Theorem 30. Given arbitrary disjoint subsets I, J" C {!,..., A^} with 1X1,1^/1 < K, we 
will bound the following quantity three times, each time with different constraints on {xjjigx and 
{yj}jej- 




(3.18) 



To be clear, our third bound will have no constraints on {xi}i,=x and {yj}j^j, thereby demonstrating 
restricted orthogonality. Note that by assumption, (3.18) is < ^(jljjj'l)-'^/^ whenever the .t,;'s and 
2/j's are in {0, 1}. We first show that this bound is preserved when we relax the Xi^s and yj's to lie 
in the interval [0, 1] . 

Pick a disjoint pair of subsets I', J' C {1, . . . , A''} with jl'j, | J^'j < K. Starting with some k Gl', 
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note that flat restricted orthogonality gives that 



iei\{k} jej 



<^(|X||J|)V2, 

<m\{k}\\j\y/'<m\j\y/' 



for every disjoint {1, . . . , A''} with IJ] < K and k Gl. Thus, we may take any € [0, 1] 

to form a convex combination of these two expressions, and then the triangle inequality gives 



e{\i\\j\y^^>xk 



iei jej 



J2 ^^'H^J 
iei\{k} jej 



> 



Xk 



+ il-Xk) 



sex 3&J 



■ iei\{k} jeJ 



( Xk, i = k ] 
iei 1, i^k ^ jeJ 



(3.19) 



Since (3.19) holds for every disjoint I, J" C {1, ... , A''} with \X\, | J"] < K and k we can do the 
same thing with an additional index i Gl' or j G J', and replace the corresponding unit coefflcient 
with some Xi or yj in [0, 1]. Continuing in this way proves the claim that (3.18) is < §{\I\\J\y/'^ 
whenever the Xi's and yj's lie in the interval [0, 1]. 

For the second bound, we assume the Xj's and yj's are nonnegative with unit norm: J2iex^i — 
J2jejyj ~ ^- bound (3.18) in this case, we partition I and J according to the size of the 
corresponding coefficients: 

Ik:={iel: 2-('=+i) <Xi< 2-"}, Jk := {j € J : 2-('=+i) < % < 2-"}. 



Note the unit-norm constraints ensure that I = UfcLo-^fe ^^"^ ~ UfcLo "^k- The triangle inequality 
thus gives 



^xm,^yjipj 
iei jeJ 



fei=ofe2=o Meifel jeJk2 



(3.20) 



By the definitions of Ifej and Jk^, the coefficients of and ^pj in (3.20) all lie in [0, 1]. As such, we 
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continue by applying our first bound: 



oo oo 



(oo \ / oo 

k=0 ^ ^fc=0 



We now observe from the definition of Ife that 



(3.21) 



iei fe=o ieiji fe=o 



Thus for any positive integer t, the Cauchy-Schwarz inequality gives 

oo t—\ oo 

k=a k=t 

(t-1 X 1/2 oo 

k—r\ ' k—i 



fe=0 



/2 



■ fe=0 ' fe=t 

<2(iV2 + ;^i/22-t), 



(3.22) 



and similarly for the jTfc's. For a fixed we note that (3.22) is minimized when K^l'^l * 



-1/2 



2 log 2 ' 



1/2 



and so we pick t to be the smallest positive integer such that * < 215^ 

continue (3.21): 



With this, we 



2 log 2 



log 2 (21og2)2i 



(3.23) 



Prom here, we claim that t < f^^^l- Considering the definition of t, this is easily verified for 
= 2, 3, . . . , 7 by showing K^/'^2-^ < for s = [i^l • For K > 8, one can use calculus to 

verify the second inequality of the following: 



^,1/0 r log «• 1 ^,1/0 log ft: 1 /log-ftT \ 1 



21og2 V log2 



2 log 2 



logi^ 
log 2 



-1/2 



meaning t < fi^l • Substituting t < + 1 and i > 1 into (3.23) then gives 



■ iei jeJ 



log 2 



<4.(^^ + l ' 



log 2 log 2 (2 log 2)2 



e{CologK + Ci), 
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with Co w 5.77, Ci w 11.85. As such, (3.18) is < C'dlogK with C" = Co + in this case. 

We are now ready for the final bound on (3.18) in which we apply no constraints on the a;,'s and 
Uj 's. To do this, we consider the positive and negative real and imaginary parts of these coefficients: 



= Xi^fei'^ s.t. a;i,fe > Vfc, 
fe=o 

and similarly for the t/j's. With this decomposition, we apply the triangle inequality to get 



iex jej 



3 3 
3 3, 

fci=ofe2=o ^ iex jeJ 



Finally, we normalize the coefficients by {J2iex^i,kj 

JV2 and (E,- 

second bound: 



3 3. \ 1/2 / \ 1/2 



< (C^logi^)||a;||||y||, 



where C = 4C' « 74.17 by the Cauchy-Schwarz inequality, and so we are done. 



□ 
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Chapter 4 



Two fundamental parameters of 



frame coherence 



Chapters 1-3 of this thesis were dedicated to a particularly popular understanding of compressed 
sensing: that matrices which satisfy the restricted isometry property (RIP) are very well-suited as 
sensing matrices. However, as these chapters show, it is very difficult to deterministically construct 
matrices which are provably RIP. It is therefore desirable to find a worthy alternative to RIP which 
admits deterministic sensing matrices. The present chapter is dedicated to one such alternative, 
namely the strong coherence property, but before we define this property, we first motivate it in the 
context of a support recovery method known as one-step thresholding ( OST). 

The main idea behind OST is that the noiseless measurement vector y = will look similar 
to the active columns of $ = [^pi ■ ■ ■ ^Pn], provided the sparsity level is sufficiently small and the 
nonzero members of x are sufficiently large in some sense. Using this intuition, it makes sense to 
find the support of x by finding the large values of 



assuming the columns of $ have unit norm. Indeed, if the nonzero entries of x are larger than the 
contribution of the cross-column interactions, then the above calculation serves as a reasonable test 
for the support of x. The magnitude of this contribution can be assessed using two measures of 
coherence. Indeed, if the columns are incoherent, then each term of this sum is small, and so it 
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makes sense to consider the worst-case coherence of 



max (4.1) 

i,j&{l,...,N} 

However, this measure of coherence does not account for sign fluxuations in the inner products, which 

should bring significant cancellations in the sum. If wc assume the support of x is drawn randomly, 
then by a concentration-of- measure argument, this sum will typically be close to its expectation, 
and so its size will rarely exceed some multiple of ||a;||i times the following maximum average: 



u := max 
ie{i,...,JV} 



N _ 



N 

(4.2) 



1 ^ 



For this reason, this notion of coherence, called average coherence, was recently introduced in [11]. 

Intuitively, worst-case coherence is a measure of dissimilarity between frame elements, whereas 
average coherence measures how well the frame elements are distributed in the unit hypersphere. 
As we will see, both worst-case and average coherence play an important role in various portions of 
sparse signal processing, provided we describe the sparse signal's support with a probabilistic model. 
In fact, [11] used worst-case and average coherence to produce probabilistic reconstruction guarantees 
for OST, permitting sparsity levels on the order of j^^^ (akin to the RJP-based guarantees). In 
accordance with our motivation above, these probabilistic guarantees require that worst-case and 
average coherence together satisfy the following property: 

Definition 43. We say an M x iV unit norm frame $ satisfies the strong coherence property if 
(SCP-1) /X < ^^^l and (SCP-2) < ^ 



1641ogA/' ' ' ~ s/m 

where /x and v are given by (4.1) and (4.2), respectively. 

The reader should know that the constant 164 is not particularly essential to the above definition; 
it is used in [11] to simplify some analysis and make certain performance guarantees explicit, but 
the constant is by no means optimal. In the next section, we will use the strong coherence property 
to continue the work of [11]. Where [11] provided guarantees for noiseless reconstruction, we will 
produce near-optimal guarantees for signal detection and reconstruction from noisy measurements of 
sparse signals. These guarantees are related to those in [35, 62, 135, 136], and we will also elaborate 
on this relationship. 



81 



The results given in [11] and the following section, as well as the applications discussed in 
[35, 62, 84, 103, 129, 134, 136, 149] demonstrate a pressing need for nearly tight frames with small 
worst-case and average coherence, especially in sparse signal processing. This chapter offers three 
additional contributions in this regard [12, 102]. In Section 4.2, wc provide a sizable catalog of 
frames that exhibit small spectral norm, worst-case coherence, and average coherence. With all 
three frame parameters provably small, these frames are guaranteed to perform well in relevant 
applications. Next, performance in many applications is dictated by worst-case coherence. It is 
therefore particularly important to understand which worst-case coherence values are achievable. 
To this end, the Welch bound (Theorem 3) is commonly used in the literature. However, the Welch 
bound is only tight when the number of frame elements N is less than the square of the spatial 
dimension M [129]. Another lower bound, given in [106, 146], beats the Welch bound when there 
are more frame elements, but it is known to be loose for real frames [53]. Given this context. 
Section 4.3 gives a new lower bound on the worst-case coherence of real frames. Our bound beats 
both the Welch bound and the bound in [106, 146] when the number of frame elements far exceeds 
the spatial dimension. Finally, since average coherence is so new, there is currently no intuition as 
to when (SCP-2) is satisfied. In Section 4.4, we use ideas akin to the switching equivalence of graphs 
to transform a frame that satisfies (SCP-1) into another frame with the same spectral norm and 
worst-case coherence that additionally satisfies (SCP-2). 



4.1 Implications of worst-case and average coherence 

Frames with small spectral norm, worst-case coherence, and/or average coherence have found use in 
recent years with applications involving sparse signals. Donoho et al. used the worst-case coherence 
in [62] to provide uniform bounds on the signal and support recovery performance of combinatorial 
and convex optimization methods and grecxiy algorithms. Later, Tropp [136] and Candes and Plan 
[35] used both the spectral norm and worst-case coherence to provide tighter bounds on the signal 
and support recovery performance of convex optimization methods for most support sets under the 
additional assumption that the sparse signals have independent nonzero entries with zero median. 
Recently, Bajwa et al. [11] made use of the spectral norm and both coherence parameters to report 
tighter bounds on the noisy model selection and noiseless signal recovery performance of an incredibly 
fast greedy algorithm called one-step thresholding ( OST) for most support sets and arbitrary nonzero 
entries. In this section, we discuss further implications of the spectral norm and worst-case and 
average coherence of frames in applications involving sparse signals. 
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4.1.1 The weeik restricted isometry property 

A common task in signal processing applications is to test whether a collection of measurements 
corresponds to mere noise [90]. For applications involving sparse signals, one can test measurements 
y e against the null hypothsis Hq : y = z and alternative hypothesis Hi : y = + z, where 
the entries of the noise vector z G C*^ are independent, identical zero- mean complex-Gaussian 
random variables and the signal x G is ii'-sparse. The performance of such signal detection 
problems is directly proportional to the energy in $a; [56, 80, 90]. In particular, existing literature 
on the detection of sparse signals [56, 80] leverages the fact that ||$a;||^ w ||a;||^ when $ satisfies the 
restricted isometry property (RIP) of order K. In contrast, we now show that the strong coherence 
property also guarantees ||$a;]|^ « ]|a;]p for most i^-sparse vectors. We start with a definition: 

Definition 44. We say a,n M x N frame $ satisfies the {K,6,p)-weak restricted isometry property 
(weak RIP) if for every /T-sparse vector y e C^, a random permutation x of y's entries satisfies 

il-5)\\xf < W^xf < {l + 5)\\xf (4.3) 

with probability exceeding 1 — p. 

At first glance, it may seem odd that we introduce a random permutation when we might as 
well define weak RIP in terms of a iC-sparse vector whose support is drawn randomly from all 
possible choices. In fact, both versions would be equivalent in distribution, but we stress that in the 
present definition, the values of the nonzero entries of x are not random; rather, the only randomness 
we have is in the locations of the nonzero entries. We wish to distinguish our results from those 
in [35], which explicitly require randomness in the values of the nonzero entries. We also note the 
distinction between RIP and weak RIP — weak RIP requires that $ preserves the energy of most 
sparse vectors. Moreover, the manner in which we quantify "most" is important. For each sparse 
vector, $ preserves the energy of most permutations of that vector, but for different sparse vectors, 
<& might not preserve the energy of permutations with the same support. That is, unlike RIP, weak 
RIP is not a statement about the singular values of submatrices of $. Certainly, matrices for which 
most submatrices are well-conditioned, such as those discussed in [135, 136], will satisfy weak RIP, 
but weak RIP does not require this. That said, the following theorem shows, in part, the significance 
of the strong coherence property. 

Theorem 45. Any M x N unit norm frame $ with the strong coherence property satisfies the 
{K,6, j^)-weak restricted isometry property provided N > 128 and 2KlogN < mhi{j^^,M}. 
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Proof. Let a; be as in Definition 44. Note that (4.3) is equivalent to | — ||a;|p | < (5||a;|p. Defining 
K := {n: \xn\ > 0}, then the Cauchy-Schwarz inequaUty gives 

\\\^xf-\\xf\ = \xU^*^^K-lK)XK\ 

< ii^^iiii(*;c*;c - ik)xk\\ < VKwwmh^ic - imu (4.4) 

where the last inequality uses the fact that || • || < • ||oo in C^. We now consider Lemma 3 

of [11], which states that for any e € [0, 1) and a > 1, — Ik) xjcWoo < with probability 

exceeding l-4i4'e-(=-^'')'/is(2+""')'''' provided K < mm{e'^iy-'^, {l+a)-^N}. We claim that (4.4) 
together with Lemma 3 of [11] guarantee |]]$x]]^— ]]x]]^| < (5]]a;]]^ with probability exceeding 1— 
In order to establish this claim, we fix £ = 10/i\/21ogiV and a = 2 log 128 — 1. It is then easy to see 
that (SCP-1) gives £ < 1, and also that (SCP-2) and 2KlogN < M give K < E^iy-^/g. Therefore, 
since the assumption that N > 128 together with 2KlogN < M implies K < {l + a)~^N, we obtain 
g_(e_v^^)7i6(2+a-i)V' < -1^. The result now follows firom the observation that 2KlogN < 
implies \/Ke < S. □ 

This theorem shows that having small worst-case and average coherence is enough to guarantee 
weak RIP. This contrasts with related results by Tropp [135, 136] that require $ to be nearly tight. 
In fact, the proof of Theorem 45 does not even use the full power of the strong coherence property; 
instead of (SCP-1), it sufiices to have n < l/(15\/log N), part of what [11] calls the coherence 
property. Also, if $ has worst-case coherence fj, = 0(1/a/M) and average coherence v = 0{1/M), 
then even if $ has large spectral norm. Theorem 45 states that $ preserves the energy of most 
i^T-sparse vectors with K = 0(M/log-/V), i.e., the sparsity regime which is linear in the number of 
measurements. 

4.1.2 Reconstruction of sparse signals from noisy measurements 

Another common task in signal processing applications is to reconstruct a ii'-sparse signal x G 
from a small collection of linear measurements y G . Recently, Tropp [136] used both the worst- 
case coherence and spectral norm of frames to find bounds on the reconstruction performance of 

basis pursuit (BP) [48] for most support sets under the assumption that the nonzero entries of x arc 
independent with zero median. In contrast, [11] used the spectral norm and worst-case and average 
coherence of frames to find bounds on the reconstruction performance of OST for most support sets 
and arbitrary nonzero entries. However, both [11] and [136] limit themselves to recovering x in the 
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Algorithm 1 One-Step Thresholding (OST) for sparse signal reconstruction [11] 
Input: An M x N unit norm frame a vector y = $a; + z, and a threshold A > 
Output: An estimate x S of the true sparse signal x 

X {Initialize} 
a; -s— $*y {Form signal proxy} 

/C ^ {n : \xn\ > A} {Select indices via OST} 

Xj^ {Reconstruct signal via least-squares} 



absence of noise, corresponding to y = <I>x, a rather ideal scenario. 

Our goal in this section is to provide guarantees for the reconstruction of sparse signals from 
noisy measurements y = $.t + z, where the entries of the noise vector z G C'*^ arc independent, 
identical complex-Gaussian random variables with mean zero and variance a^. In particular, and in 
contrast with [62], our guarantees will hold for arbitrary unit norm frames $ without requiring the 
signal's sparsity level to satisfy K = 0(/i~^). The reconstruction algorithm that we analyze here is 
the OST algorithm of [11], which is described in Algorithm 1. The following theorem extends the 
analysis of [11] and shows that the OST algorithm leads to near-optimal reconstruction error for 
certain important classes of sparse signals. 

Before proceeding further, we first define some notation. We use snr := ||x||^/E[||2r||^] to denote 
the signal-to-noise ratio associated with the signal reconstruction problem. Also, we use 

2V2 



%{t) := |n : K| > ^^2a^logiv| 



for any t G (0,1) to denote the locations of all the entries of x that, roughly speaking, lie above the 
noise floor a. Finally, we use 



r^(t) := |n : \xn\ > y v'2 log ivj 



to denote the locations of entries that, roughly speaking, lie above the self-interference floor i^\\x\\. 

Theorem 46 (Reconstruction of sparse signals). Take an M x N unit norm frame $ which satisfies 
the strong coherence property, pick t G (0, 1), and choose A = ^/2a^^\ogN max{ ^l-tV Msnr, ;^}. 
Further, suppose x G has support K, drawn uniformly at random from all possible K-subsets of 
{1, . . . , N}. Then provided 

N 
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Algorithm 1 produces K, such that Ta{t) r\Tn{t) C ^ C /C and x such that 

\\x - x\\ < C2\J CT^I^I log A'' + csllxj^^^^ll (4.6) 
with probability exceeding 1 — lOA''"-^. Finally, defining T := \Ta{t) fl 7^(t)|, we further have 

\\x - x\\ < C2\/a^KlogN + csWx - xt\\ (4.7) 

in the same probability event. Here, c\ = 37e, C2 = i_^-i/2 , o.nd C3 = 1 + ^^^-1/2 are numerical 
constants. 

Proof. To begin, note that since ||$||2 > ^, we have from (4.5) that K < M/{2logN). It is then 
easy to conclude from Theorem 5 of [11] that K. satisfies Ta{t) (1 Ti^{t) C ^ C /C with probability 
exceeding 1 — 6N~^. Therefore, conditioned on the event £1 := {Ta{t) fl 7^(t) C ^ C /C}, we can 
make use of the triangle inequality to write 

\\x - x\\ < \\x^ -X;^\\ + \\X;^^^\\. (4.8) 

Next, we may use (4.5) and the fact that $ satisfies the strong coherence property to conclude 
from [135] (see, e.g.. Proposition 3 of [11]) that ^ Ik\\2 < e~^/^ with probability exceeding 

1 - 2N~'^. Hence, conditioning on £1 and £2 := {|l<i'/c*K; ^ < e~^/^}, we have that (*;c)^ = 

^'^*ic'^k)^^^*ic ^^^^^ * submatrix of a full column rank matrix Therefore, given £1 and £2, 

we may write 

^jc = + z) = x^ + {^t)''^>c\ic^ic\ic + i^t)^^' (4-9) 

and so substituting (4.9) into (4.8) and applying the triangle inequality gives 

Ik -ill < ||($^)^3'yc\^a;^\^|| + ||(*£)^^|| + ||.^;yc\^ll 

< (1 + ||($t$^)-i||,||$^$^^^||,)||^^^^|| + ||($^$^)-i||,||$^.||. (4.10) 

Since, given £1, we have that ^^^^ — Ik and ^^^^^-^^ are submatrices of ^j^^k: ~ ^k, and since 
the spectral norm of a matrix provides an upper bound for the spectral norms of its submatrices, 
we have the following given £1 and £2: \\^%^^\jch ^ e~^/^ and IK^J^^^c)"^!!^ - i-e-i/2 - We can 
now substitute these bounds into (4.10) and make use of the fact that < |^|^/^||$j^-2||(x) to 
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conclude that 

11^ - ^ i_e-i/2 ll^ie^ll°° + + i_e-i/2 j I'^A/cll' 

given fi and £2- At this point, define the event £z = {||^j^-z||oo < 2-y/o^log7V} and note from 
Lemma 6 of [11] that Pr(5|) < 2(V27rlogiV N)-'^ . A union bound therefore gives (4.6) with 
probability exceeding 1 — 10-/V~^. For (4.7), note that IC C )C implies \1C\ < K, and so 7^(t) ri7^(f) C 
K, implies that \\x,^y,^\\ < \\xic\{TAt)n%{t))\\ = \\x - xt\\. □ 

A few remarks are in order now for Theorem 46. First, if $ satisfies the strong coherence property 
and $ is nearly tight, then OST handles sparsity that is almost linear in M: K = 0{M/logN) 
from (4.5). Second, we do not impose any control over the size of T, but rather we state the 
result in generality in terms of T; its size is determined by the signal class x belongs to, the worst- 
case coherence of the frame $ we use to measure x, and the magnitude of the noise that perturbs 
^x. Third, the £2 error associated with the OST algorithm is the near-optimal (modulo the log 
factor) error of \/ a'^K log N plus the best T-term approximation error caused by the inability of 
the OST algorithm to recover signal entries that are smaller than 0(y^||a;||-\/2 log N). In particular, 
if the ii'-sparso signal x, the worst-case coherence /i, and the noise z together satisfy j|a; — xt\\ = 
0{y^a^K log N), then the OST algorithm succeeds with a near-optimal £2 error of \\x — x\\ = 
0{y/a^Klog N). To see why this error is near-optimal, note that a iiT-dimension vector of random 
entries with mean zero and variance cr^ has expected squared norm a'^K; in our case, we pay an 
additional log factor to find the locations of the K nonzero entries among the entire A''-dimensional 
signal. It is important to recognize that the optimality condition ||a; — Xt II = 0{^/a^KlogN) 
depends on the signal class, the noise variance, and the worst-case coherence of the frame; in 
particular, the condition is satisfied whenever ||a;x:\r^(t)|| = 0{\/a^KlogN), since 

\\x - x,.|| < \\x^\rut)\\ + W\r,it)\\ = o(Va2jnogA^) + W\r,it)\\- 

The following lemma provides classes of sparse signals that satisfy \\xic\-j-^^(t)\\ = 0{\/a'^KlogN) 
given sufficiently small noise variance and worst-case coherence, and consequently the OST algorithm 
is near-optimal for the reconstruction of such signal classes. 

Lemma 47. Take an M x N unit norm frame $ with worst-case coherence < for some 
Co > 0, and suppose that K < ^an^n^j^g for some ci > 0. Fix a constant j5 G (0, 1], and suppose 
the magnitudes of PK nonzero entries of x are some a = Q{-\/a^ logN), while the magnitudes of 
the remaining (1 — nonzero entries are not necessarily same, but are smaller than a and scale 
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as 0{^/a^logN). Then ||a;;c\r^(t)|| = 0{^/ a^K log N), provided co < 

Proof. Let /C bo the support of x, and define X := {n : |a;„| = a}. We wish to show that I C T^{t), 
since this implies ||a;x;\r^(t) || < ||a;x;\i|| = 0{\/u'^K\ogN). In order to prove X CT^{t), notice that 

Ikf = Ikxf + \\xK\i\? < PKa' + (1 - P)Ka' = Ka\ 
and so combining this with the fact that H^Hl > ^ gives 

Therefore, provided cq < we have that I ^Tn{t). □ 

In words, Lemma 47 imphcs that OST is near-optimal for those ii'-sparse signals whose entries 
above the noise floor have roughly the same magnitude. This subsumes a very important class of 
signals that appears in applications such as multi-label prediction [86], in which all the nonzero entries 
take values ±a. Theorem 46 is the first result in the sparse signal processing literature that does not 
require RIP and still provides near-optimal reconstruction guarantees for such signals from noisy 
measurements, while using either random or deterministic frames, even when K = 0{M/logN). 

Note that our techniques can be extended to reconstruct noisy signals, that is, we may consider 
measurements of the form y = ^{x + n) + z, where n G is also a noise vector of independent, 
identical zero-mean complex-Gaussian random variables. In particular, if the frame $ is tight, then 
our measurements will not color the noise, and so noise in the signal may be viewed as noise in the 
measurements: y = + + z); if the frame is not tight, then the noise will become correlated 
in the measurements, and performance would be depend nontrivially on the frame's Gram matrix. 
Also, Theorem 46 can be generalized to approximately sparse signals; the analysis follows similiar 
lines, but is rather cumbersome, and it appears as though the end result is only strong enough in 
the case of very nearly sparse signals. As such, we omit this result. 

4.2 Frame constructions 

In this section, we consider a range of nearly tight frames with small worst-case and average co- 
herence. We investigate various ways of selecting frames at random from different libraries, and we 
show that for each of these frames, the spectral norm, worst-case coherence, and average coherence 
are all small with high probability. Later, we will consider deterministic constructions that use 
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Gabor and chirp systems, spherical designs, equiangular tight frames, and error-correcting codes. 
For the reader's convenience, all of these constructions are summarized in Table 4.1. Before we go 
any further, we consider the following lemma, which gives three different sufficient conditions for a 
frame to satisfy (SCP-2). These conditions will prove quite useful in this section and throughout 
the chapter. 

Lemma 48. For any M x N unit norm frame each of the following conditions implies v < : 
0) {fk, E^=i ^n) = ^ for every k = l,...,N, 
(ii) N>2M and E^=i = 0, 
(Hi) N>M'^+3M + 3 and \\ X)^^! <Pnf < N. 
Proof. For condition (i), we have 



1 



max 



A'' - 1 ie{i,...,N} 



N 



1 



max 



A'' - 1 ie{i,...,JV} 



N 



1 f N 



N-1\M 



N-M 



N-M 



The Welch bound (Theorem 3) therefore gives = ivzr (s " l) = m{n-i) ^ f^y m(jv-i) ^ 7^- 
For condition (ii), we have 



1 



max 



- 1 ie{i,..;N} 



N 
J = l 



1 



max 



N -1 ie{i,:.,N} 



N 



N -1 



Considering the Welch bound, it suffices to show < -j=J^^j^~^. Rearranging gives 



N'^ -{M + l)N - M{M - 1) > 0. 



(4.11) 



When A'' = 2M, the left-hand side of (4.11) becomes (M — 1)^, which is trivially nonnegative. 
Otherwise, we have 



N >2M + 1>M + 1 + ^M{M -l)> ^ +\/( 2 ) +M{M-l). 

In this case, by the quadratic formula and the fact that the left-hand side of (4.11) is concave up in A'', 
we have that (4.11) is indeed satisfied. For condition (iii), we use the triangle and Cauchy-Schwarz 
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inequalities to get 



1 

max 



N - 1 ie{i,...,N} 



N 



1 , 

< — max 



N -1 \ie{i,...,N} 



N 



Viv + 1 

- N-l 



Considering the Welch bound, it suffices to show ^J^-^^ < y ^\N-i) ■ Taking x := \/]V and 
rearranging gives a polynomial: x^ — (M^ + M + l)x'^ — 2M^x — M{M — 1) > 0. By convexity and 
monotonicity of the polynomial in [M + |,oo), it can be shown that the largest real root of this 
polynomial is always smaller than M + |. Also, considering it is concave up in x, it suffices that 
VN = a; > M + |, which we have since N > + 3M + 3 > {M + □ 

4.2.1 Normalized Gaussian frames 

Construct a matrix with independent, Gaussian-distributed entries that have zero mean and unit 
variance. By normalizing the columns, we get a matrix called a normalized Gaussian frame. This 
is perhaps the most widely studied type of frame in the signal processing and statistics literature. 
To be clear, the term "normalized" is intended to distinguish the results presented here from re- 
sults reported in earlier works, such as [11, 17, 38, 140], which only ensure that Gaussian frame 
elements have unit norm in expectation. In other words, normalized Gaussian frame elements are 
independently and uniformly distributed on the unit hypersphere in R^. The following theorem 
characterizes the spectral norm and the worst-case and average coherence of normalized Gaussian 
frames. 

Theorem 49 (Geometry of normalized Gaussian frames). Build a real M x N frame ^ by drawing 
entries independently at random from a Gaussian distribution of zero mean and unit variance. Next, 
construct a normalized Gaussian frame $ by taking ip„ := j^jf^ for every n = 1,...,N. Provided 
60 log A?" < M < 4^3^, then the following simultaneously hold with probability exceeding 1 — 11N~^: 

fi) n < yisiogAf _ 



( "/ ^ M-v'12Mlog 



N' 



(ill) \mU< Vm+Vn+V2ToKn _ 

Proof. Theorem 49(i) can be shown to hold with probability exceeding 1 — 2N~^ by using a bound 
on the norm of a Gaussian random vector in Lemma 1 of [95] and a bound on the magnitude of 
the inner product of two independent Gaussian random vectors in Lemma 6 of [79]. Specifically, 
pick any two distinct indices i,j S {1, . • . ,N}, and define probability events £i := < £i}, 

90 



S2 := {llVilP > M(l - £2)}, and £3 := {llV'jll^ > M{1 - E2)} for ei = VlSMlogA^ and £2 
\/ {12\og N)/M. Then it follows from the union bound that 



Pr |(^i,<^,)| > 



£1 



M(l-e2) 



£1 



\m\m\ M(i-£2) 



<Pr(£:D + Pr(£:2^) + Pr(^3')- 



One can verify that Pr(f2) = Pr(^3) < N ^ because of Lemma 1 of [95], and we further have 
Pr(^f) < because of Lemma 6 of [79] and the fact that M > 60 log TV. Thus, for any fixed i 

and j, I {ifi, (pj)\ < \/l5 log N / {sfM — ^J12 log N) with probability exceeding 1 — 4A''~^. It therefore 
follows by taking a union bound over all (^) choices for i and j that Theorem 49(i) holds with 
probability exceeding 1 — 2iV^^. 

Theorem 49(ii) can be shown to hold with probability exceeding 1 — QN~^ by appealing to the pre- 
ceding analysis and Hoeffding's inequality for a sum of independent, bounded random variables [83]. 
Specifically, fix any index i G {1, . . . ,N}, and define random variables Zij := j^^{ipi,(fij). Next, 
define the probability event 



AT . 

.7=1 



Vl51ogiV 



3- 



1 Vm - ^/UTogN 



Using the analysis for the worst-case coherence of $ and taking a union bound over the — 1 
possible j's gives Pr(£:|) < AN'"^. Furthermore, taking £3 := V151og N/{M - V12M log N), then 
elementary probability analysis gives 



Pr 



N 



> £3 < Pr 



N 



< 



> £3 



> £3 



£4,^=1 = x] p^,{x) dB^-\x)+m-', (4.12) 



where §*^~^ denotes the unit hypersphere in M^, H^~^ denotes the (M — l)-dimensional Hausdorff 
measure on and p^i{x) denotes the probability density function for the random vector ipi. 

The first thing to note here is that the random variables {Zij : j ^ i] are bounded and jointly 
independent when conditioned on £4 and Lpi. This assertion mainly follows from Bayes' rule and 

the fact that {ipj : j ^ i} arc jointly independent when conditioned on (pi. The second thing to 
note is that E[Zij \ £4,ipi] = for every j ^ i. This comes from the fact that the random vectors 
{(fin}n=i are independent and have a uniform distribution over S^~^, which in turn guarantees that 
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the random variables {Zij -. j ^ i} have a symmetric distribution around zero when conditioned 
on £4 and ipi. We can therefore make use of Hoeffding's inequaUty [83] to bound the probabihty 
expression inside the integral in (4.12) as 

£4,<^, = xj <2e-(^-i)/2^, (4.13) 

which is bounded above by 2N~'^ provided M < . We can now substitute (4.13) into (4.12) 

and take the union bound over the N possible choices for i to conclude that Theorem 49 (ii) holds 
with probability exceeding 1 — 6N~^. 

Lastly, Theorem 49 (ill) can be shown to hold with probability exceeding 1 — 3N~^ by using a 
bound on the spectral norm of standard Gaussian random matrices reported in [117] along with 
Lemma 1 of [95]. Specifically, define an x iV diagonal matrix D := diagdj-i/;! ||~-'^, . . . , ||'!/'jv||~''^), 
and note that the entries of ^' ^D^^ are independently and normally distributed with zero mean 
and unit variance. We therefore have from (2.3) in [117] that 

Pr (||*||2 > VM + VAf + ^21ogiv) < 2Af-\ (4.14) 

In addition, we can appeal to the preceding analysis for the probability bound on Theorem 49(i) 
and conclude using Lemma 1 of [95] and a union bound over the A'' possible choices for i that 

Pr (||D||2 > (m - VSMlogTv) < N''^. (4.15) 

Finally, since ||$||2 < ||'S'||2||-D||2, we can take a union bound over (4.14) and (4.15) to argue that 
Theorem 49(iii) holds with probability exceeding 1 — 3A''~^. 

The complete result now follows by taking a union bound over the failure probabilities for the 
conditions (i)-(iii) in Theorem 49. □ 

Example 50. To illustrate the bounds in Theorem 49, we ran simulations in MATLAB. Picking 
N = 50000, we observed 30 realizations of normalized Gaussian frames for each M = 700, 900, 1100. 
The distributions of /i, u, and ||$||2 were rather tight, so we only report the ranges of values attained. 



Pr 



N 
.7=1 



> £3 
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along with the bounds given in Theorem 49: 



M = 700 : 



H e [0.1849,0.2072] 
u e [0.5643,0.6613] X 10-3 
$||2 e [8.0521,8.0835] 



< 0.8458 



< 0.0320 



< 11.9565 



M = 900 : 



II e [0.1946,0.2206] 
u € [0.5800,0.7501] X 10-3 
ll^lla e [8.4352,8.4617] 



< 0.6848 



< 0.0229 



< 10.3645 



M = 1100 : 



II e [0.1807,0.1988] 
u e [0.5260,0.6713] X 10-3 
$||2 e [7.7262,7.7492] 



< 0.5852 



< 0.0177 



< 9.2927 



These simulations seem to indicate that our bounds on /i and ||$||2 reflect real- world behavior, at 
least within an order of magnitude, whereas the bound on v is rather loose. 

4.2.2 Random hctrmonic frames 

Random harmonic frames, constructed by randomly selecting rows of a discrete Fourier transform 
(DFT) matrix and normalizing the resulting columns, have received considerable attention lately in 
the compressed sensing literature [36, 39, 118]. However, there is no result in the literature that 
gives the worst-case coherence of random harmonic frames. To fill this gap, the following theorem 
gives the spectral norm and the worst-case and average coherence of random harmonic frames. 

Theorem 51 (Geometry of random harmonic frames). Let F be an N x N non-normalized discrete 
Fourier transform matrix, explicitly, Fke := e^'^''^^/^ for each /c, £ = 0, . . . , iV — 1. Next, let {Bi}^^^ 
be a collection of independent Bernoulli random variables with mean ^ , and take Ai := {i : Bi — 1} . 
Finally, construct an \A4 \ x TV harm,onic frame <I> by collecting rows of F which correspond to indices 
in M and normalizing the columns. Then $ is a unit norm tight frame: \\'^\\2 = [MJ- ^Iso, provided 
161og < M < Y, the following simultaneously hold with probability exceeding 1 — 4:N~^ — N~^: 



(i) \M < \M\ < IM, 
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Proof. The claim that $ is tight follows trivially from the fact that the rows of F are orthogonal 
and that the rows of $ correspond to a subset of the rows of F. Next, we define the probability 
events £x := {\M\ < |M} and £2 ■= {\M\ > |M}, and claim that Pr(ff U < N''^ + N'"^. The 
proof of this claim follows from a Bernstein-like large deviation inequality. Specifically, note that 
\M\ = Xi^Io^ with E[\M\] = M, and so we have from Theorems A.1.12 and A.1.13 of [7] and 
page 4 of [118] that for any e\ S [0, 1), 

Pr (\M\ > (1 + ei)M) < e-^^i(i-^i)/2 and Pr (\M\ < (1 - ei)M) < e-^''^^. (4.16) 

Taking ci := i, then a union bound gives Pr{£f U £2) — + -^"^ provided M > 16 log TV. 

Conditioning on £1 r\ £2, we have that Theorem 51(i) holds trivially, while Theorem 51 (ii) follows 
from Lemma 48. Specifically, we have that y > M guarantees A'' > 2\A4\ because of the conditioning 
on £1(1 £2, which in turn implies that $ satisfies either condition (i) or (ii) of Lemma 48, depending 
on whether G A4. This therefore establishes that Theorem 51(i)-(ii) simultaneously hold with 
probability exceeding 1 — N~'^ — N~'^. 

The only remaining claim is that (jl < 62 ■= \J (118(A'' — M) log N)/MN with high probability. 
To this end, define p ■= and pick any two distinct indices i,j€:{0,...,N— 1}. Note that 

{¥>i, fj) = I] BkFkiFkj = 13 (Sft - P)FkiFkj, (4.17) 

fc^^O fc— 

where the last equality follows from the fact that F has orthogonal columns. Next, we write F^iF^j = 
cos(^fe) + ism{6k) for some 6k € [0, 27r). Then applying the union bound to (4.17) and to the real 
and imaginary parts of F^iF^j gives 

Pr(K<^i,<Pj)| >£2) 

<Pr(|g(E.-p)i..i^|>0)+Pr(|A.|<^) 

<Pr[| ^\sfe-p)cos(0fe)| > ^) +Pr[| Y.{Bk-p)sm{eu)\ > ^) + N-\ (4.18) 

^ fc=0 ^ ^ fc=0 ^ 

where the last term follows from (4.16) and the fact that M > 16 log A^. Define random variables 
Zk := {Bk—p) cos(^fe). Note that the Z^s have zero mean and are jointly independent. Also, the ZkS 
are bounded by 1 —p almost surely since \ {Bk —p) cos(^fe)| < max{p, 1 —p) and N > 2M. Moreover, 
the variance of each Zk is bounded: Var(Z^) < p{l — p). Therefore, we may use the Bernstein 
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inequality for a sum of independent, bounded random variables [21] to bound the probability that 
I J2k=o deviates from £3 := 

. N-l . 

Pv(\j2iBk- p) cos{0k) > £3 I < 2e-^3/(2iVKi-P)+2(i-P)c3/3) < 2iV-3. 

^ fe=0 ^ 

Similarly, the probability that | "^^^q {Bk — p) sin(0fe)| > £3 is also bounded above by 2N~^ . Sub- 
stituting these probability bounds into (4.18) gives \{(pi,(fj)\ > £2 with probability at most 5iV~^ 
provided M > 16 log A''. Finally, we take a union bound over the (^) possible choices for i and j to 
get that Theorem 51(iii) holds with probability exceeding 1 — 3A''~^. 

The result now follows by taking a final union bound over U £2 ^^'^ > £2}- □ 

As stated earlier, random harmonic frames are not new to sparse signal processing. Interest- 
ingly, for the application of compressed sensing, [38, 118] provides performance guarantees for both 
random harmonic and Gaussian frames, but requires more rows in a random harmonic frame to ac- 
commodate the same level of sparsity. This suggests that random harmonic frames may be inferior 
to Gaussian frames as compressed sensing matrices, but practice suggests otherwise [63] . In a sense, 
Theorem 51 helps to resolve this gap in understanding; there exist compressed sensing algorithms 
whose performance is dictated by worst-case coherence [11, 62, 134, 136], and Theorem 51 states 
that random harmonic frames have near-optimal worst-case coherence, being on the order of the 
Welch bound with an additional y/logN factor. 

Example 52. To illustrate the bounds in Theorem 51, we ran simulations in MATLAB. Picking 

A'' = 5000, we observed 30 realizations of random harmonic frames for each M = 1000, 1250, 1500. 

The distributions of |A^|, u, and fj. were rather tight, so we only report the ranges of values attained, 

along with the bounds given in Theorem 51. Notice that Theorem 51 gives a bound on v in terms 

of both lA^I and u. To simplify matters, we show that i' < ."""^ < — , where the minimum 

ymax|A1| v|M| 
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and maximum are taken over all realizations in the sample: 



M = 1000: \M\ e [961,1052] C [500, 1500] 

V e [0.2000, 0.8082] x lO'^ < 0.0023 w M74| 

H e [0.0746, 0.0890] < 0.8967 

M= 1250: \M\ € [1207,1305] C [625, 1875] 



V € [0.2000,0.6273] X 10-3 ^ O-^Ol^ viaos 



-3 < 0.0018 

II G [0.0623, 0.0774] < 0.7766 



M= 1500: |A^| e [1454,1590] C [750,2250] 

V e [0.2000,0.4841] x 10"^ < 0.0015 w 2^ 

H e [0.0571,0.0743] < 0.6849 



3 



The reader may have noticed how consistently the average coherence value oi v k. 0.2000 x 10 
was realized. This occurs precisely when the zeroth row of the DFT is not selected, as the frame 
elements sum to zero in this case: 



1 

V := — — - max 
N -1 je{i,...,jv} 



N 



1 

max 



A'' - 1 ie{l,...,N} 



N 

'Pi^^'Pj ) - II V'"^ 



1 



N -1 



These simulations seem to indicate that our bounds on |A^|, and /i leave room for improvement. 
The only bound that lies within an order of magnitude of real- world behavior is our bound on \M\. 

4.2.3 Gabor and chirp frames 

Gabor frames constitute an important class of frames, as they appear in a variety of applications such 
as radar [82], speech processing [145], and quantum information theory [121]. Given a nonzero seed 
function / : Zm — > C, we produce all time- and frequency-shifted versions: fxy{t) := /(t— x)e^'^'*'*/''^, 
t € Zm- Viewing these shifted functions as vectors in C*^ gives an M x Gabor frame. The 
following theorem characterizes the spectral norm and the worst-case and average coherence of Gabor 
frames generated from either a deterministic AUtop vector [3] or a random Steinhaus vector. 

Theorem 53 (Geometry of Gabor frames). Take an Alltop function defined by f{t) := '^^^^^^^ j 
t G Zm- Also, take a random. Steinhaus function defined by g{t) :— y^e^'^'^', t G Zm, where 
the Ot 's are independent random variables distributed uniformly on the unit interval. Then the 
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M X Gabor frames $ and ^ generated by f and g, respectively, are unit norm and tight, i.e., 
Il^lh = ll^lh = \/M. Also, both frames have average coherence < jj^- Furthermore, if M > 5 
is prime, then ii<^ = , while if M > 13, then /i* < ^/{\^\ogM)JM with probability exceeding 
1-4M-1. 

Proof. The tightness claim follows from [96], in which it was shown that Gabor frames generated 
by nonzero seed vectors are tight. The bound on average coherence is a consequence of Theorem 7 
of [11] concerning arbitrary Gabor frames. The claim concerning follows directly from [129], 
while the claim concerning /i^ is a simple consequence of Theorem 5.1 of [111]. □ 

Instead of taking all translates and modulates of a seed fimction, [41] constructs chirp frames 
by taking all powers and modulates of a chirp function. Picking M to be prime, we start with a 
chirp function /im : C defined by hnit) := e'''*^*-*^)/^, t e Zm. The frame elements 

are then defined entry wise by habit) := -^hM{t)°'e'^'^^^/'^ , t € "Lm- Certainly, chirp frames are, 
at the very least, similar in spirit to Gabor frames. As a matter of fact, the chirp frame is in 
some sense equivalent to the Gabor frame generated by the Alltop function: it is easy to verify 
that h(^-6x,v-3x''){t) = e^''^(*'+^'')/^/cc3;(t), and when M > 5, the map {x,y) ^ {-6x,y - 3x^) is 
a permutation over Z|^. Using terminology from Definition 67, we say the chirp frame is wiggling 
equivalent to a unitary rotation of permuted Alltop Gabor frame elements. As such, by Lemma 68, 
the chirp frame has the same spectral norm and worst-case coherence as the Alltop Gabor frame, but 
the average coherence may be difi'erent. In this case, the average coherence still satisfies (SCP-2). 
Indeed, adding the frame elements gives 

M-lM-l M-1 M-1 



a=0 6=0 ^ a=0 b=0 

M-1 ^M-1 s 

= 7y7 E hM{trM6o{t) = \/M( Y hM{Or) 6o{t) = M^'^5o{t), 



a=0 ^ a=0 

and so (/ia'6',Eflo'E^o'/*a6) = {ha'V , M^I^6o) = M^/^ha'b'{0) = M=^. Therefore, applying 
Lemma 48 (i) gives the result: 

Theorem 54 (Geometry of chirp frames). Pick M prime, and let $ be the M x frame of all 
powers and modulates of the chirp function hM- Then ^ is a unit norm tight frame with ||i>||2 = VM, 
and has worst case coherence jjL = and average coherence v < . 

Example 55. To illustrate the bounds in Theorems 53 and 54, we consider the examples of an 
Alltop Gabor frame and a chirp frame, each with M = 5. In this case, the Gabor frame has 
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V K. 0.1348 < 0.1667 w while the chirp frame h.BS v = I < \ = Note the Gabor 

and chirp frames have different average coherences despite being equivalent in some sense. For the 
random Steinhaus Gabor frame, we ran simulations in MATLAB and observed 30 realizations for 
each M = 60, 70, 80. The distributions of v and /x were rather tight, so we only report the ranges of 
values attained, along with the bounds given in Theorem 53: 

M = 60: u € [0.3916,0.5958] X 10-2 < 0.0164 

At e [0.3242,0.4216] < 0.9419 



M = 70: V & [0.3151,0.4532] X 10-2 < 0.0141 

At e [0.2989, 0.3814] < 0.8883 

M = 80: V & [0.2413, 0.3758] X 10-2 < 0.0124 

IJL e [0.2711,0.3796] < 0.8439 

These simulations seem to indicate that bound on u is conservative by an order of magnitude. 



4.2.4 Spherical 2-designs 

Lemma 48(ii) leads one to consider frames of vectors that sum to zero. In [84], it is proved that real 
unit norm tight frames with this property make up another well-studied class of vector packings: 
spherical 2-designs. To be clear, a collection of unit-norm vectors $ C M*^ is called a spherical 
t-design if, for every polynomial g{xi, . . . , Xm) of degree at most t, we have 

where S^-^ is the unit hypersphere in and H^-^ denotes the (M — l)-dimensional Hausdorff 
measure on §^-^. In words, vectors that form a spherical t-design serve as good representatives 
when calculating the average value of a degree-t polynomial over the unit hypersphere. Today, such 
designs find application in quantum state estimation [81]. 

Since real unit norm tight frames always exist for A'' > M + l, one might suspect that spherical 2- 
designs are equally common, but this intuition is faulty — the sum-to-zero condition introduces certain 
issues. For example, there is no spherical 2-dcsign when M is odd and N = M + 2. In [101], spherical 
2-designs are explicitly characterized by construction. The following theorem gives a construction 



JJM-1(§M- 
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based on harmonic frames: 

Theorem 56 (Geometry of spherical 2-designs). Pick M even and N > 2M. Take an f x N 
harmonic frame 5* by collecting rows from a discrete Fourier transform matrix according to a set 
of nonzero indices A4 and normalizing the columns. Let m{n) denote nth largest index in A4, and 
define a real M x N frame $ by 



^ J , k = l,...,M, l = 0,...,N-l. 

sin(2^^"f^), k even 

Then $ is unit norm and tight, i.e., H^Hl = with worst-case coherence < fj,^ and average 
coherence v < . 

— VM 

Proof. It is easy to verify that i> is a unit norm tight frame using the geometric sum formula. Also, 

since the frame elements sum to zero and TV > 2M, the claim regarding average coherence follows 
from Lemma 48(ii). It remains to prove /Lt$ < /i.^. For each pair of indices i,j € {1, . . . , N}, we have 

meM ^ 

2 ij-^ f2i:m{i — j) 



^ ^"H — N — J 



and so Kipi^fj)] = ^ KV'iiV'j)!- This gives the result. □ 

Example 57. To illustrate the bounds in Theorem 56, we consider the spherical 2-design constructed 
from a 9 X 37 harmonic equiangular tight frame [146]. Specifically, we take a 37 x 37 DFT matrix, 
choose nonzero row indices 

M = {1, 7, 9, 10, 12, 16, 26, 33, 34}, 

and normalize the columns to get a harmonic frame ^' whose worst-case coherence achieves the 
Welch bound: /x* = 9 (371^1 y ~ 0.2940. Following Theorem 56, we produce a spherical 2-design $ 
with w 0.1967 < nq, and u w 0.0278 < 0.0464 w 

4.2.5 Steiner equiangulsir tight frames 

We now consider the construction of Chapter 1: Steiner equiangular tight frames (ETFs). Recall 
that these fail to break the square-root bottleneck as deterministic RIP matrices. By contrast, 
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Steiner ETFs are particularly well-suited as sensing matrices for one-step thresholding. To be clear, 
every Steiner ETF satisfies N > 2M. Moreover, if in step (iii) of Theorem 7, we choose the distinct 
rows to be the |5i rows of the (complex) Hadamard matrix H that are not all-ones, then the sum 
of columns of each Fj is zero, meaning the sum of columns of F is also zero. This was done in 

(1.6), and the columns sum to zero, accordingly. Therefore, by Lemma 48(ii), Steiner ETFs satisfy 
(SCP-2). This gives the following theorem: 

Theorem 58 (Geometry of Steiner equiangular tight frames). Build an M x N matrix $ according 
to Theorem 7, and in step (iii), choose rows from the (complex) Hadamard matrix H that are not 
all-ones. Then $ is an equiangular tight frame, meaning ||$||2 = ^ and fj? = j^^^zirj' ^''^'^ 
average coherence v < ■^J= . 



Example 59. To illustrate the bound in Theorem 58, we note that the example given in (1.6) has 
^ ~ 11 - 3v^ ~ Vm- 



4.2.6 Code- based frames 

Many structures in coding theory are also useful in frame theory. In this section, we build frames from 
a code that originally emerged with Berlekamp in [22], and found recent reincarnation with [147]. 
We build a 2™ x 2(*+^)™ frame, indexing rows by elements of F2in and indexing columns by {t + 1)- 
tuples of elements from F2"> . For x G and a G , the corresponding entry of the matrix $ is 
given by 

= ^(_l)Tv[«oa;+EU«.^''+'] (4.19) 

where Tr : F2"« ¥2 denotes the trace map, defined by Tr(z) = ^™ z'^' . The following theorem 
gives the spectral norm and the worst-case and average coherence of this frame. 

Theorem 60 (Geometry of code-based frames). The 2™ x 2^*+^''" frame defined by (4.19) is unit 
norm and tight, i.e., ||$||2 = 2*™, with worst-case coherence /x < and average coherence 

Proof. For the tightness claim, we use the linearity of the trace map to write the inner product of 
rows X and y: 

^2™ ^2™ 

= —( ^ (_l)Tr[ao(a;+y)]\ ^ ... ^ (_ l)Tr [ E '=1 + + _ 
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Name 


R/C 


Size 


Normalized Gaussian 


K 


M X M 


Random harmonic 


C 


\M\ X N, jM < \M\ < jM 


Alltop Gabor 


C 


M X 


Stcinhaus Gabor 


c 


M X 


Chirp 


c 


M X 


Spherical 2-design 
from harmonic G 


M 


M X N 


Steiner 


C 




Code- based 


M 


2m X 2(*+i)'" 



^ v/15 log W ^ yiSlogW 

/M-V121ogiV — M-V12Mlog Af 



< 



< 



(W-M) log ]V 



✓57 



/ N-M 
V M(N-l) 



< 










< 


1 

M + 1 


< 


1 

M + 1 


< 






\/M 


< 


t^F 




^/M 


< 


t^F 




s/M 


< 


fJ-F 







Table 4.1: Eight constructions detailed in this chapter. The bounds given for the normalized Gaus- 
sian, random harmonic and Steinhaus Gabor frames are satisfied with high probability. All of the 
frames above are unit norm tight frames except for the normalized Gaussian frame, which has 
squared spectral norm ||i>||^ < {y/M +y/N +V21og7V)2/(M - ySBTogiV) in the same probability 
event. 



This expression is 2*™ when x ^ y. Otherwise, note that ao ^-^ (— l)Tr[ao(a;+j/)] g {±1} defines a 
homomorphism onF2m. Since (x + y)"^ i— >■ —1, the inverse images of ±1 under this homomorphism 
must form two cosets of equal size, and so X^a^gF^^ (_i^Tr["o(a;+j/)] _ meaning distinct rows in $ 
are orthogonal. Thus, $ is a unit norm tight frame. 

For the worst-case coherence claim, we first note that the linearity of the trace map gives 

i.e., every inner product between columns of $ is a sum over another column. Thus, there exists 
a e Fot^ such that 



= 2™-F ^ (_l)Tr[ao(x+!/)+EU«i((^+!')'' + '+E';i(=!^J/)''(x+!/f'-''^' + 0]. 



where the last equality is by the identity (x -|- y)^'^^ = x^'^^ + y'^'~^^ + J2]Z^o{^y)^^ + 2/)^'"^^^^'*'^) 
whose proof is a simple exercise of induction. Prom here, we perform a change of variables: u := x+y 
and V := xy. Notice that {u,v) corresponds to {x,y) for some x ^ y whenever {z + x){z + y) = 
z"^ + UZ + V has two solutions, that is, whenever Tr(^) = 0. Since [u, v) corresponds to both [x, y) 
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and {y,x), we must correct for under-counting: 



Tr(i)/u^)=0 



< 



NTr[p(«)i,] 



(4.20) 



"7^0 Tr(i)/u2)=0 

where the second equaUty is by repeated application of Tr(z) = Tr(z^), and 

To bound n, we will count the u's that produce nonzero sumniands in (4.20). 

For each u 7^ 0, wc have a homomorphism Xu- {v € ¥2^ : Tr(;^) = 0} ^ {il} defined 
by Xu{v) := {-l)'^lP(^)''l Pick u 7^ for which there exists a v such that both Tr(^) = 
and Tr[p('u)w] = 1. Then Xu{v) = —1, and so the kernel of Xu is the same size as the coset 
{v e : Tr(;^) = 0,Xu{v) = —1}, meaning the summand associated with u in (4.20) is zero. 
Hence, the nonzero summands in (4.20) require Tr(;^) = and Tr[p(u)t;] = 0. This is certainly 
possible whenever p{u) = 0. Exponentiation gives 

t i-l 

which has degree 2^*~^ — 2*^^. Thus, j)(u) — has at most 2^*~^ — 2*~^ solutions, and each such 
u produces a summand in (4.20) of size 2"*"^. Next, we consider the u's for which Tr(^) = 0, 
Tr[p(u)v] = 0, and p('u) ^ 0. In this case, the hj^perplanes defined by Tr(;^) = and Tr[p(u)v] = 
are parallel, and so piu) = . Here, 



t i-l 

t — 1 — \ — \ 2*— J — 1 2*+*— — ^-|-2*~-'~ 



i=l j=0 



which has degree 2^* ^ + 2* ^. Thus, p{u) = ^ has at most 2^* ^ + 2* ^ solutions, and each 
such u produces a summand in (4.20) of size 2™"^. We can now continue the bound from (4.20): 
22'"^2 < 2™ + 2(22*-i-2*-i+22*-i + 2*-i)2'"-i < 2'"+2*+i. From here, isolating gives the claim. 
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Lastly, for average coherence, pick some x e . Then summing the entries in the a;th row gives 

,/Om ^ ' 

2(t+i/2)m^ a; = 
0, X ^ 

That is, the frame elements sum to a multiple of an identity basis element: X^Q.gj't+i = 

Since every entry in row a; = is ;ypf, we have {Va' ,J2aev*t^ ^'^) ~ '^''"2"^'" every a' € Fjm^, 

and so by Lemma 48(i), we are done. □ 

Example 61. To illustrate the bounds in Theorem 60, we consider the example where m ~ A and 
t = 1. This is a 16 X 256 code-based frame <& with jj, = ^ < = ^^^^^t-i ^^'^ ~ Vf — k ~ 

4.3 Fundamental limits on worst-case coherence 

In many applications of frames, performance is dictated by worst-case coherence [11, 35, 62, 84, 103, 
129, 134, 136, 149]. It is therefore particularly important to understand which worst-case coherence 
values are achievable. To this end, the Welch bound is commonly used in the literature. When 
worst-case coherence achieves the Welch bound, the frame is equiangular and tight [129]. However, 
equiangular tight frames cannot have more vectors than the square of the spatial dimension [129], 
meaning the Welch bound is not tight whenever N > M^. When the number of vectors N is 
exceedingly large, the following theorem gives a better bound: 

Theorem 62 ([5, 109]). Every sufficiently large M x N unit norm frame with N > 2M and worst- 
case coherence fJ- < ^ satisfies 

2, 1 ClogTV 
/^^log->^ (4.21) 

for some constant C > 0. 

For a fixed worst-case coherence /i < ^ , this bound indicates that the number of vectors N cannot 
exceed some exponential in the spatial dimension M, that is, N < for some a > 0. However, 
since the constant C is not established in this theorem, it is unclear which base a is appropriate for 
each fi. The following theorem is a little more explicit in this regard: 
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Theorem 63 ([106, 146]). Every M x N unit norm frame has worst-case coherence > 1 — 
2N~^/^^~^h Furthermore, taking N = @{a'^), this lower bound goes 1 — | as M ^ oo. 

For many applications, it does not make sense to use a complex frame, but the bound in Theo- 
rem 63 is known to be loose for real frames [53] . We therefore improve Theorems 62 and 63 for the 
case of real unit norm frames: 

Theorem 64. Every real M x N unit norm frame has worst-case coherence 

'M-l T{^y 



jJL > COS 



(4.22) 



iV7rV2 r(f) 

Furthermore, taking N = Q{a^), this lower bound goes to cos(^) as M ^ oo. 

Before proving this theorem, we first consider the special case where the dimension is M = 3: 

Lemma 65. Given N points on the unit sphere C M.^, the smallest angle between points is 
<2cos-i(l-^). 

Proof. We first claim there exists a closed spherical cap in §^ with area ^ that contains two of the 
N points. Suppose otherwise, and take 7 to be the angular radius of a spherical cap with area 
That is, 7 is the angle between the center of the cap and every point on the boundary. Since the 
cap is closed, we must have that the smallest angle a between any two of our N points satisfies 
a > 27. Let C{p,9) denote the closed spherical cap centered at p e §^ of angular radius 9, and 
let P denote our set of N points. Then we know for p G P, the C(p, 7)'s are disjoint, f > 7, and 
Upep ^(Pi f ) ^ and so taking 2-dimensional Hausdorff measures on the sphere gives 

H2(§2) = 4^ = h2 [ U C{p, 7)) < [ IJ C{p, f )] < H2(§2), 

a contradiction. 

Since two of the points reside in a spherical cap of area we know a. is no more than 
twice the radius of this cap. We use spherical coordinates to relate the cap's area to the ra- 
dius: H2(C(-,7)) = 27r Qs\^<^ d0 = 27r(l - C0S7). Therefore, when H2(C(-,7)) = we have 7 = 
cos^^(l — ;^), and so a < 27 gives the result. □ 

Theorem 66. Every real 3 x N unit norm frame has worst-case coherence A* > 1 — ;^ -l- 772 • 

Proof. Packing N unit vectors in corresponds to packing 2N antipodal points in S^, and so 
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Figure 4.1: Different bounds on worst-case coherence for M = 3, A?' = 3, . . . , 55. Stars give numer- 
ically determined optimal worst-case coherence of N real unit vectors, found in [53]. Dotted curve 
gives Welch bound, dash-dotted curve gives bound from Theorem 63, dashed curve gives bound from 
Theorem 64, and solid curve gives bound from Theorem 66. 

Lemma 65 gives a < 2cos~^(l — j^). Applying the double angle formula to 

II = cosa > cos[2cos~"^(l — j^)] 



gives the result. 



□ 



Now that we understand the special case where M = 3, we tackle the general case: 

Proof of Theorem 64- As in the proof of Theorem 66, we relate packing N unit vectors to packing 
2 A'' points in the hypersphere §^~^ C M.^ . The argument in the proof of Lemma 65 generalizes 

so that two of the 2N points must reside in some closed hypcrsphcrical cap of hypcrsurfacc area 
_i_jjM-i^gM-i^^ Therefore, the smallest angle a between those points is no more than twice 
the radius of this cap. Let C(7) denote a hyperspherical cap of angular radius 7. Then we use 
hyperspherical coordinates to get 



H^-i(C(7))=/ / •••/ / sin^-2(0i)---sin^('AM-2) d(/)M-i---d<^i 



A/-3 



= 27r [] TT^ 



/2 



r( 



,=1 r(| + i) 



-^1 / sin^-2, 
; + l)/io 



M- 



\) -2. rl 



(4.23) 
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We wish to solve for 7, but analytically inverting Jq sin ^ is difficult. Instead, we use 



sin (f) > f ioT (p € [0, f ]. Note that we do not lose generality by forcing 7 < f , since this is 
guaranteed with N >2. Continuing (4.23) gives 

Using the formula for a hypersphere's hypersurface area, we can express the left-hand side of (4.24): 



(r,^\M~-l 1 M/2 

(M-i)7r(^-3)/2r(^) - ^ 2N ^ > ivr(l)' 



Isolating 27 above and using a < 27 and /U = cos a gives (4.22). The second part of the result conies 
from a simple application of Stirling's approximation. □ 

In [53], numerical results are given for M = 3, and we compare these results to Theorems 63 
and 64 in Figure 4.1. Considering this figure, we note that the bound in Theorem 63 is inferior 
to the maximum of the Welch bound and the bound in Theorem 64, at least when M = 3. This 
illustrates the degree to which Theorem 64 improves the bound in Theorem 63 for real frames. In 
fact, since cos (f ) > 1 - i for all a>2, the bound for real fr amcs in Theorem 64 is asymptotically 
better than the bound for complex frames in Theorem 63. Moreover, for M = 2, Theorem 64 says 
> cos(;^), and [19] proved this bound to be tight for every N > 2. Lastly, Figure 4.1 illustrates 
that Theorem 66 improves the bound in Theorem 64 for the case M = 3. 

In many applications, large dictionaries are built to obtain sparse reconstruction, but the known 
guarantees on sparse reconstruction place certain requirements on worst-case coherence. Asymptot- 
ically, the bounds in Theorems 63 and 64 indicate that certain exponentially large dictionaries will 
not satisfy these requirements. For example, ii N = 0(3^), then fxp = ^(|) by Theorem 63, and 
if the frame is real, we have /j, = f2(i) by Theorem 64. Such a dictionary will only work for sparse 
reconstruction if the sparsity level K is sufficiently small; deterministic guarantees require K < 
[62, 134], while probabilistic guarantees require K < /i"^ [11, 135], and so in this example, the 
dictionary can, at best, only accommodate sparsity levels that are smaller than 10. Unfortunately, 
in real-world applications, we can expect the sparsity level to scale with the signal dimension. This 
in mind. Theorems 63 and 64 tell us that dictionaries can only be used for sparse reconstruction if 
N = 0((2 -|- e)^) for some sufficiently small £ > 0. To summarize, the Welch bound is known to be 
tight only if N < M^, and Theorems 63 and 64 give bounds which are asympotically better than 
the Welch bound whenever N = 0(2^). When N is between and 2^, the best bound to date 
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is the (loose) Welch bound, and so more work needs to be done to bound worst-case coherence in 
this parameter region. 

4.4 Reducing average coherence 

In [11], average coherence is used to derive a number of guarantees on sparse signal processing. Since 
average coherence is so new to the frame theory literature, this section will investigate how average 
coherence relates to worst-case coherence and the spectral norm. We start with a definition: 

Definition 67 (Wiggling and flipping equivalent frames). We say the frames $ and ^ are wiggling 
equivalent if there exists a diagonal matrix D of unimodular entries such that ^ = ^D. Furthermore, 
they are flipping equivalent if D is real, having only ±l's on the diagonal. 

The terms "wiggling" and "flipping" are inspired by the fact that individual frame elements 
of such equivalent frames are related by simple unitary operations. Note that every frame with 
N nonzero frame elements belongs to a flipping equivalence class of size 2^, while being wiggling 
equivalent to uncountably many frames. The importance of this type of frame equivalence is, in 
part, due to the following lemma, which characterizes the shared geometry of wiggling equivalent 
frames: 

Lemma 68 (Geometry of wiggling equivalent frames). Wiggling equivalence preserves the norms of 
frame elements, the worst-case coherence, and the spectral norm. 

Proof. Take two frames $ and 4* such that = <PD. The first claim is immediate. Next, the Gram 
matrices are related by \I/*\1> = D*^*^D. Since corresponding off-diagonal entries are equal in 
modulus, we know the worst-case coherences are equal. Finally, ||*||2 = ll*^'*!!! = \\^DD*^*\\2 = 
||$$*||2 = ll^lli) and so we are done. □ 

Wiggling and flipping equivalence are not entirely new to frame theory. For a real equiangular 
tight frame the Gram matrix $*$ is completely determined by the sign pattern of the off-diagonal 
entries, which can in turn be interpreted as the Seidel adjacency matrix of a graph G$. As such, 
flipping a frame element € $ has the effect of negating the corresponding row and column in the 
Gram matrix, which further corresponds to switching the adjacency rule for that vertex G V{G^) 
in the graph — vertices are adjacent to after switching precisely when they were not adjacent before 
switching. Graphs are called switching equivalent if there is a sequence of switching operations that 
produces one graph from the other; this equivalence was introduced in [139] and was later extensively 
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studied by Seidel in [122, 123]. Since flipping equivalent real equiangular tight frames correspond 
to switching equivalent graphs, the terms have become interchangeable. For example, [24] uses 
switching (i.e., wiggling and flipping) equivalence to make progress on an important problem in 
frame theory called the Paulsen problem, which asks how close a nearly unit norm, nearly tight 
frame must be to a unit norm tight frame. 

Now that we understand wiggling and flipping equivalence, we are ready for the main idea behind 
this section. Suppose we are given a unit norm frame with acceptable spectral norm and worst-case 
coherence, but we also want the average coherence to satisfy (SCP-2). Then by Lemma 68, all of the 
wiggling equivalent frames will also have acceptable spectral norm and worst-case coherence, and 
so it is reasonable to check these frames for good average coherence. In fact, the following theorem 
guarantees that at least one of the flipping equivalent frames will have good average coherence, with 
only modest requirements on the original frame's redundancy. 

Theorem 69 (Constructing frames with low average coherence). Let $ be an M x N unit norm 
frame with M < 4 1^ "4 jv ■ Then there exists a frame \1/ that is flipping equivalent to $ and satisfies 



— -/m 



Proof. Take {Rn}n=i be a Rademacher sequence that independently takes values ±1, each with 
probability \. We use this sequence to randomly flip define Z := ^ d\ag{Rn}n=\- Note that if 
Pr(i^z < > 0, we are done. Fix some iG {1,..., N}. Then 



Pr 



N-1 



N 



N 



>(^-l)^M. (4.25) 



We can view JZjjH ^ji^i^ Vj) as a sum of A'' — 1 independent zero- mean complex random variables 
that are bounded by . We can therefore use a complex version of Hoeffding's inequality [83] (see, 
e.g.. Lemma 3.8 of [10]) to bound the probability expression in (4.25) as < 4e-(JV-i)/4M_ prom here. 



N-l 



a union bound over all N choices for i gives Pr(i/2 < ^) > l-4A/'e~(^~^)/^^, and so M < jj^^^j^ 
implies Pr{i^z < '^^) > 0, as desired. □ 

While Theorem 69 guarantees the existence of a fiipping equivalent frame with good average 
coherence, the result does not describe how to find it. Certainly, one could check all 2^ frames 
in the flipping equivalence class, but such a procedure is computationally slow. As an alternative, 
we propose a linear-time flipping algorithm (Algorithm 2). The following theorem guarantees that 
linear-time flipping will produce a frame with good average coherence, but it requires the original 
frame's redundancy to be higher than what suffices in Theorem 69. 
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Algorithm 2 Linear-time flipping 



Input: An M x N unit norm frame 3> 

Output: An MxN unit norm frame ^ that is flipping equivalent to $ 

tpi {Keep first frame element} 

for n = 2 to N do 



if II Y.i=i i'i + V'nll < II Y.i=i - fnW then 



else 

■0n < 'fin 

end if 
end for 



{Keep frame element to make sum length shorter} 
{Flip frame element to make sum length shorter} 



Theorem 70. Suppose N > + 3M + 3. Then Algorithm 2 outputs an M x N frame * that is 
flipping equivalent to $ and satisfies v < . 

Proof. Considering Lemma 48(iii), it suffices to have || J2n=i ^nlP ^ ^- We will use induction to 

show \\Y.n=i'^n\\^ < k for k = 1, . . . ,N. Clearly, || V'nlP = ll<P«|P = 1 < 1- Now assume 

II 12n=i ''PnW^ < k. Then by our choice for lAfe+i in Algorithm 2, we know that || Yln=i V'n+V'/c+ilP ^ 
II Yln=i V'n V'/c+i|P- Expanding both sides of this inequality gives 



n=l 



+'^'Re(^'ipn,1pk+l) + \\lpk+lf< ^'^n - 2Re/ ^ Vn, V'fe+1 ) + IIV'fe+1 



n=l 



and so R.e(^^^j V'fe+i) ^ 0- Therefore, 



fe+i 



^Vn +2Re(^Vn,V'fe+i) + IIV'fe+i||'< ^V-^ 



n=l 



n=l 



+ ||^fe+l||^<fc + l. 



where the last inequality uses the inductive hypothesis. 



□ 



Example 71. Apply linear-time hipping to reduce average coherence in the following matrix: 



1 

7E 



+ + + + - + + + + - 

+ - + + + -- - + - 

+ + + + + + + + - + 

--- + - + + -- - 

- + + -- + -- -- 



Here, f$ w 0.3778 > 0.2683 « and linear-time flipping produces the flipping pattern D := 

diag(H \ + -\ Then $£) has average coherence u^d ~ 0.1556 < = This 

illustrates that the condition N > + 3M -|- 3 in Theorem 70 is sufficient but not necessary. 
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